Mobile-TeleVision: Predictive Motion Priors for Humanoid Whole-Body Control

Chenhao Lu; Xuxin Cheng; Jialong Li; Shiqi Yang; Mazeyu Ji; Chengjing Yuan; Ge Yang; Sha Yi; Xiaolong Wang

Mobile-TeleVision: Predictive Motion Priors for Humanoid Whole-Body Control

Chenhao Lu, Xuxin Cheng, Jialong Li, Shiqi Yang, Mazeyu Ji, Chengjing Yuan, Ge Yang, Sha Yi, Xiaolong Wang

TL;DR

This work tackles the dichotomy between precise upper-body manipulation and robust locomotion in humanoid robots by decoupling the control streams and introducing Predictive Motion Priors (PMP) learned via a CVAE to encode future upper-body motion. The lower-body policy is trained with PPO to leverage PMP as an informative state, while the upper body is controlled through IK or motion retargeting, enabling precise, load-bearing manipulation during movement. Across simulation and real-world experiments on Unitree H1 (and GR1 in simulation), PMP improves upper-body manipulation precision and stabilizes locomotion under perturbations, outperforming RL-only whole-body baselines. The approach enables effective teleoperation with separate or unified control schemes, advancing practical loco-manipulation for remote operation in diverse environments.

Abstract

Humanoid robots require both robust lower-body locomotion and precise upper-body manipulation. While recent Reinforcement Learning (RL) approaches provide whole-body loco-manipulation policies, they lack precise manipulation with high DoF arms. In this paper, we propose decoupling upper-body control from locomotion, using inverse kinematics (IK) and motion retargeting for precise manipulation, while RL focuses on robust lower-body locomotion. We introduce PMP (Predictive Motion Priors), trained with Conditional Variational Autoencoder (CVAE) to effectively represent upper-body motions. The locomotion policy is trained conditioned on this upper-body motion representation, ensuring that the system remains robust with both manipulation and locomotion. We show that CVAE features are crucial for stability and robustness, and significantly outperforms RL-based whole-body control in precise manipulation. With precise upper-body motion and robust lower-body locomotion control, operators can remotely control the humanoid to walk around and explore different environments, while performing diverse manipulation tasks.

Mobile-TeleVision: Predictive Motion Priors for Humanoid Whole-Body Control

TL;DR

Abstract

Mobile-TeleVision: Predictive Motion Priors for Humanoid Whole-Body Control

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)