DemoHLM: From One Demonstration to Generalizable Humanoid Loco-Manipulation
Yuhui Fu, Feiyang Xie, Chaoyi Xu, Jing Xiong, Haoqi Yuan, Zongqing Lu
TL;DR
DemoHLM tackles the challenge of generalizable humanoid loco-manipulation by combining a simulation-based data generation pipeline with a two-tier control hierarchy. From a single simulated demonstration, it synthesizes hundreds to thousands of trajectories across locomotion, pre-manipulation, and manipulation stages, and trains a high-level imitation-learning policy to drive a low-level whole-body controller. The approach demonstrates strong sim-to-real transfer on a Unitree G1 across ten tasks, with data quantity positively impacting performance and generalization, and showing compatibility with multiple BC architectures. This enables scalable, data-efficient learning for complex loco-manipulation tasks in real-world environments.
Abstract
Loco-manipulation is a fundamental challenge for humanoid robots to achieve versatile interactions in human environments. Although recent studies have made significant progress in humanoid whole-body control, loco-manipulation remains underexplored and often relies on hard-coded task definitions or costly real-world data collection, which limits autonomy and generalization. We present DemoHLM, a framework for humanoid loco-manipulation that enables generalizable loco-manipulation on a real humanoid robot from a single demonstration in simulation. DemoHLM adopts a hierarchy that integrates a low-level universal whole-body controller with high-level manipulation policies for multiple tasks. The whole-body controller maps whole-body motion commands to joint torques and provides omnidirectional mobility for the humanoid robot. The manipulation policies, learned in simulation via our data generation and imitation learning pipeline, command the whole-body controller with closed-loop visual feedback to execute challenging loco-manipulation tasks. Experiments show a positive correlation between the amount of synthetic data and policy performance, underscoring the effectiveness of our data generation pipeline and the data efficiency of our approach. Real-world experiments on a Unitree G1 robot equipped with an RGB-D camera validate the sim-to-real transferability of DemoHLM, demonstrating robust performance under spatial variations across ten loco-manipulation tasks.
