Table of Contents
Fetching ...

Mobile-TeleVision: Predictive Motion Priors for Humanoid Whole-Body Control

Chenhao Lu, Xuxin Cheng, Jialong Li, Shiqi Yang, Mazeyu Ji, Chengjing Yuan, Ge Yang, Sha Yi, Xiaolong Wang

TL;DR

This work tackles the dichotomy between precise upper-body manipulation and robust locomotion in humanoid robots by decoupling the control streams and introducing Predictive Motion Priors (PMP) learned via a CVAE to encode future upper-body motion. The lower-body policy is trained with PPO to leverage PMP as an informative state, while the upper body is controlled through IK or motion retargeting, enabling precise, load-bearing manipulation during movement. Across simulation and real-world experiments on Unitree H1 (and GR1 in simulation), PMP improves upper-body manipulation precision and stabilizes locomotion under perturbations, outperforming RL-only whole-body baselines. The approach enables effective teleoperation with separate or unified control schemes, advancing practical loco-manipulation for remote operation in diverse environments.

Abstract

Humanoid robots require both robust lower-body locomotion and precise upper-body manipulation. While recent Reinforcement Learning (RL) approaches provide whole-body loco-manipulation policies, they lack precise manipulation with high DoF arms. In this paper, we propose decoupling upper-body control from locomotion, using inverse kinematics (IK) and motion retargeting for precise manipulation, while RL focuses on robust lower-body locomotion. We introduce PMP (Predictive Motion Priors), trained with Conditional Variational Autoencoder (CVAE) to effectively represent upper-body motions. The locomotion policy is trained conditioned on this upper-body motion representation, ensuring that the system remains robust with both manipulation and locomotion. We show that CVAE features are crucial for stability and robustness, and significantly outperforms RL-based whole-body control in precise manipulation. With precise upper-body motion and robust lower-body locomotion control, operators can remotely control the humanoid to walk around and explore different environments, while performing diverse manipulation tasks.

Mobile-TeleVision: Predictive Motion Priors for Humanoid Whole-Body Control

TL;DR

This work tackles the dichotomy between precise upper-body manipulation and robust locomotion in humanoid robots by decoupling the control streams and introducing Predictive Motion Priors (PMP) learned via a CVAE to encode future upper-body motion. The lower-body policy is trained with PPO to leverage PMP as an informative state, while the upper body is controlled through IK or motion retargeting, enabling precise, load-bearing manipulation during movement. Across simulation and real-world experiments on Unitree H1 (and GR1 in simulation), PMP improves upper-body manipulation precision and stabilizes locomotion under perturbations, outperforming RL-only whole-body baselines. The approach enables effective teleoperation with separate or unified control schemes, advancing practical loco-manipulation for remote operation in diverse environments.

Abstract

Humanoid robots require both robust lower-body locomotion and precise upper-body manipulation. While recent Reinforcement Learning (RL) approaches provide whole-body loco-manipulation policies, they lack precise manipulation with high DoF arms. In this paper, we propose decoupling upper-body control from locomotion, using inverse kinematics (IK) and motion retargeting for precise manipulation, while RL focuses on robust lower-body locomotion. We introduce PMP (Predictive Motion Priors), trained with Conditional Variational Autoencoder (CVAE) to effectively represent upper-body motions. The locomotion policy is trained conditioned on this upper-body motion representation, ensuring that the system remains robust with both manipulation and locomotion. We show that CVAE features are crucial for stability and robustness, and significantly outperforms RL-based whole-body control in precise manipulation. With precise upper-body motion and robust lower-body locomotion control, operators can remotely control the humanoid to walk around and explore different environments, while performing diverse manipulation tasks.

Paper Structure

This paper contains 14 sections, 4 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: The training pipeline consists of three stages: (a) preprocessing of the motion dataset by mapping local rotation, (b) training a CVAE to capture prior knowledge of upper body human motion, and (c) RL training where the upper body is controlled using sampled target joint positions and the lower body is trained using prior motion representations.
  • Figure 2: Evaluation for precision (Top) and stability (Bottom) under disturbance (lower is better for all figures). We sample $5$ trajectories for each motion from the motion dataset in simulation and report their mean episode metrics for each type of disturbance. The locomotion commands are from the base state of the corresponding motion. The robot is pushed every 5 seconds by a sudden increase in velocity to a value of $\texttt{push\_vel}$. Motion speed is changed by multiplying the frame rate of retargeted motion by the corresponding factor. PMP shows lower values, which correspond to better performance.
  • Figure 3: Left: A unified teleoperation setup using Apple Vision Pro for upper body control and pedals for lower body locomotion. Right: The H1 robot with a customized head, neck, and dual 6-DoF dexterous hands, featuring an active neck and stereo camera for immersive teleoperation cheng2024tv.
  • Figure 4: Robustness testing. Left: robot being pushed while standing; Right: robot recovers to the stable standing pose.