HYPERmotion: Learning Hybrid Behavior Planning for Autonomous Loco-manipulation

Jin Wang; Rui Dai; Weijie Wang; Luca Rossini; Francesco Ruscelli; Nikos Tsagarakis

HYPERmotion: Learning Hybrid Behavior Planning for Autonomous Loco-manipulation

Jin Wang, Rui Dai, Weijie Wang, Luca Rossini, Francesco Ruscelli, Nikos Tsagarakis

TL;DR

HYPERmotion tackles the challenge of long-horizon loco-manipulation by humanoids in unstructured environments. It integrates RL-based learning of whole-body motions with an optimization-based planner, a reusable motion library, and LLM/VLM grounding to map natural-language instructions to sequences of primitive actions. Key contributions include a four-sector methodology (motion generation, morphology selection, LLM-driven planning, and user prompts), sim-to-real deployment on a 38-DoF humanoid, and a morphology selector that leverages depth and 2D vision for ground-aware action selection, achieving zero-shot planning for diverse tasks. The framework demonstrates robust adaptation to new tasks and environments, enabling more autonomous and flexible human-robot collaboration, while identifying limits related to library size, retraining needs, and disturbance handling that guide future work.

Abstract

Enabling robots to autonomously perform hybrid motions in diverse environments can be beneficial for long-horizon tasks such as material handling, household chores, and work assistance. This requires extensive exploitation of intrinsic motion capabilities, extraction of affordances from rich environmental information, and planning of physical interaction behaviors. Despite recent progress has demonstrated impressive humanoid whole-body control abilities, they struggle to achieve versatility and adaptability for new tasks. In this work, we propose HYPERmotion, a framework that learns, selects and plans behaviors based on tasks in different scenarios. We combine reinforcement learning with whole-body optimization to generate motion for 38 actuated joints and create a motion library to store the learned skills. We apply the planning and reasoning features of the large language models (LLMs) to complex loco-manipulation tasks, constructing a hierarchical task graph that comprises a series of primitive behaviors to bridge lower-level execution with higher-level planning. By leveraging the interaction of distilled spatial geometry and 2D observation with a visual language model (VLM) to ground knowledge into a robotic morphology selector to choose appropriate actions in single- or dual-arm, legged or wheeled locomotion. Experiments in simulation and real-world show that learned motions can efficiently adapt to new tasks, demonstrating high autonomy from free-text commands in unstructured scenes. Videos and website: hy-motion.github.io/

HYPERmotion: Learning Hybrid Behavior Planning for Autonomous Loco-manipulation

TL;DR

Abstract

Paper Structure (31 sections, 13 equations, 19 figures, 8 tables)

This paper contains 31 sections, 13 equations, 19 figures, 8 tables.

Introduction
Related Work
Methodology
Autonomous Loco-manipulation via HYPERmotion
Learning Whole-body Motion Generation
Humanoid Robot Task Planning with grounded language models
Experiment
Whole-body Trajectory Learning
Morphology Selection Towards Different Scenarios
Loco-manipulation Tasks with Language Model Planner
Conclusion
Robot System Setup
Robot hardware
Robot software
Details of Robot Learning
...and 16 more sections

Figures (19)

Figure 1: HYPERmotion enables the humanoid robot to learn, plan, and select behaviors to complete long-horizon tasks. Steps 1-5 illustrate how the robot, guided by foundation models, autonomously performs locomotion and manipulation after interpreting verbal instruction and chooses motion modes for different scenarios independently.
Figure 2: Overview of HYPERmotion.We decompose the framework into four sectors: Motion generation is assigned for learning and training whole-body motion skills for new tasks and storing them in the motion library. User input includes received task instructions and initialization prompt sets. Task planning generates a task graph that guides the robot's behavior through reasoning and planning features of LLM and passes action commands to the real robot. Morphology Selector is used for action determination in specific sub-tasks, selecting the appropriate morphology for locomotion and manipulation based on grounded spatial knowledge and robot intrinsic features.
Figure 3: Whole-body tasks learning illustration in training, simulation and real-world settings.
Figure 4: Robotic Morphology Selector extracts spatial geometric data and 2D observations from the physical environment upon receiving the language-conditioned task state and interacts with the VLM incorporating the grounded robot’s affordances, so as to provide the optimal motion morphology that meets the requirements of given task scenario during manipulation and locomotion process.
Figure 5: End effector position trajectory when executing different tasks in various environments
...and 14 more figures

HYPERmotion: Learning Hybrid Behavior Planning for Autonomous Loco-manipulation

TL;DR

Abstract

HYPERmotion: Learning Hybrid Behavior Planning for Autonomous Loco-manipulation

Authors

TL;DR

Abstract

Table of Contents

Figures (19)