Mobile-TeleVision: Predictive Motion Priors for Humanoid Whole-Body Control
Chenhao Lu, Xuxin Cheng, Jialong Li, Shiqi Yang, Mazeyu Ji, Chengjing Yuan, Ge Yang, Sha Yi, Xiaolong Wang
TL;DR
This work tackles the dichotomy between precise upper-body manipulation and robust locomotion in humanoid robots by decoupling the control streams and introducing Predictive Motion Priors (PMP) learned via a CVAE to encode future upper-body motion. The lower-body policy is trained with PPO to leverage PMP as an informative state, while the upper body is controlled through IK or motion retargeting, enabling precise, load-bearing manipulation during movement. Across simulation and real-world experiments on Unitree H1 (and GR1 in simulation), PMP improves upper-body manipulation precision and stabilizes locomotion under perturbations, outperforming RL-only whole-body baselines. The approach enables effective teleoperation with separate or unified control schemes, advancing practical loco-manipulation for remote operation in diverse environments.
Abstract
Humanoid robots require both robust lower-body locomotion and precise upper-body manipulation. While recent Reinforcement Learning (RL) approaches provide whole-body loco-manipulation policies, they lack precise manipulation with high DoF arms. In this paper, we propose decoupling upper-body control from locomotion, using inverse kinematics (IK) and motion retargeting for precise manipulation, while RL focuses on robust lower-body locomotion. We introduce PMP (Predictive Motion Priors), trained with Conditional Variational Autoencoder (CVAE) to effectively represent upper-body motions. The locomotion policy is trained conditioned on this upper-body motion representation, ensuring that the system remains robust with both manipulation and locomotion. We show that CVAE features are crucial for stability and robustness, and significantly outperforms RL-based whole-body control in precise manipulation. With precise upper-body motion and robust lower-body locomotion control, operators can remotely control the humanoid to walk around and explore different environments, while performing diverse manipulation tasks.
