ExBody2: Advanced Expressive Humanoid Whole-Body Control
Mazeyu Ji, Xuanbin Peng, Fangchen Liu, Jialong Li, Ge Yang, Xuxin Cheng, Xiaolong Wang
TL;DR
Exbody2 tackles the challenge of expressive, robust humanoid whole-body control by introducing a sim-to-real framework with a generalist policy learned from retargeted diverse motion data and specialist policies finetuned for targeted motions. It couples automated data curation via a feasibility-diversity principle with a decoupled motion-velocity control strategy and a teacher-student RL training pipeline to enable deployable real-world performance. Key contributions include automated lower-body filtering to balance feasibility and diversity, a two-stage training paradigm, and a velocity-based global tracking approach that preserves expressive motion. Empirical results on a Unitree G1 demonstrate superior tracking fidelity and stability in both simulation and real-world tests, with specialist finetuning offering additional gains on challenging tasks and OOD scenarios. These advances push humanoid expressiveness closer to human-level motion while maintaining robustness in real-world environments.
Abstract
This paper tackles the challenge of enabling real-world humanoid robots to perform expressive and dynamic whole-body motions while maintaining overall stability and robustness. We propose Advanced Expressive Whole-Body Control (Exbody2), a method for producing whole-body tracking controllers that are trained on both human motion capture and simulated data and then transferred to the real world. We introduce a technique for decoupling the velocity tracking of the entire body from tracking body landmarks. We use a teacher policy to produce intermediate data that better conforms to the robot's kinematics and to automatically filter away infeasible whole-body motions. This two-step approach enabled us to produce a student policy that can be deployed on the robot that can walk, crouch, and dance. We also provide insight into the trade-off between versatility and the tracking performance on specific motions. We observed significant improvement of tracking performance after fine-tuning on a small amount of data, at the expense of the others.
