Human-Humanoid Robots Cross-Embodiment Behavior-Skill Transfer Using Decomposed Adversarial Learning from Demonstration
Junjia Liu, Zhuo Li, Minghao Yu, Zhipeng Dong, Sylvain Calinon, Darwin Caldwell, Fei Chen
TL;DR
The paper tackles the data bottleneck and cross-embodiment transfer challenge in humanoid loco-manipulation by introducing a Unified Digital Human (UDH) as a common prototype and decomposing high-DoF control into learned behavior primitives via Decomposed Adversarial Imitation Learning (DAIL). It couples kinematic motion retargeting with an interaction-graph guided high-level policy to plan latent behaviors that coordinate across body parts, followed by embodiment-specific fine-tuning with an MLP to map to dynamics. Key contributions include the UDH design with 92 DoFs, per-part adversarial learning for primitives, the interaction graph framework with a style discriminator, and demonstration across five diverse humanoids with improved data efficiency and transfer performance. The approach promises practical impact for rapid deployment of humanoid skills across platforms, reducing data needs while preserving natural, stable loco-manipulation behavior.
Abstract
Humanoid robots are envisioned as embodied intelligent agents capable of performing a wide range of human-level loco-manipulation tasks, particularly in scenarios requiring strenuous and repetitive labor. However, learning these skills is challenging due to the high degrees of freedom of humanoid robots, and collecting sufficient training data for humanoid is a laborious process. Given the rapid introduction of new humanoid platforms, a cross-embodiment framework that allows generalizable skill transfer is becoming increasingly critical. To address this, we propose a transferable framework that reduces the data bottleneck by using a unified digital human model as a common prototype and bypassing the need for re-training on every new robot platform. The model learns behavior primitives from human demonstrations through adversarial imitation, and the complex robot structures are decomposed into functional components, each trained independently and dynamically coordinated. Task generalization is achieved through a human-object interaction graph, and skills are transferred to different robots via embodiment-specific kinematic motion retargeting and dynamic fine-tuning. Our framework is validated on five humanoid robots with diverse configurations, demonstrating stable loco-manipulation and highlighting its effectiveness in reducing data requirements and increasing the efficiency of skill transfer across platforms.
