Expressive Whole-Body Control for Humanoid Robots
Xuxin Cheng, Yandong Ji, Junming Chen, Ruihan Yang, Ge Yang, Xiaolong Wang
TL;DR
The paper introduces ExBody, a goal-conditioned RL framework that enables a humanoid robot to perform expressive, diverse motions by learning from large-scale human mocap data while relaxing lower-body imitation. It combines data curation, motion retargeting to hardware, and a carefully designed reward structure to train a single policy that tracks root motion and upper-body expression, achieving robust sim-to-real transfer on a Unitree H1. Key contributions include (i) a data-driven retargeting and initialization pipeline, (ii) a dual-goal RL objective balancing expressivity with locomotion, and (iii) extensive sim and real-world demonstrations of hands shaking, dancing, and adaptive walking. This approach advances natural, interactive humanoid behavior and highlights the value of leveraging rich human motion datasets for real-world robotics.
Abstract
Can we enable humanoid robots to generate rich, diverse, and expressive motions in the real world? We propose to learn a whole-body control policy on a human-sized robot to mimic human motions as realistic as possible. To train such a policy, we leverage the large-scale human motion capture data from the graphics community in a Reinforcement Learning framework. However, directly performing imitation learning with the motion capture dataset would not work on the real humanoid robot, given the large gap in degrees of freedom and physical capabilities. Our method Expressive Whole-Body Control (Exbody) tackles this problem by encouraging the upper humanoid body to imitate a reference motion, while relaxing the imitation constraint on its two legs and only requiring them to follow a given velocity robustly. With training in simulation and Sim2Real transfer, our policy can control a humanoid robot to walk in different styles, shake hands with humans, and even dance with a human in the real world. We conduct extensive studies and comparisons on diverse motions in both simulation and the real world to show the effectiveness of our approach.
