Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation
Tairan He, Zhengyi Luo, Wenli Xiao, Chong Zhang, Kris Kitani, Changliu Liu, Guanya Shi
TL;DR
This work introduces H2O, a scalable RL-based framework for real-time whole-body humanoid teleoperation using only an RGB camera. By combining a retargeting pipeline with a sim-to-data approach and domain randomization, it trains a robust motion imitator that transfers to real hardware in a zero-shot manner. The system demonstrates diverse, dynamic capabilities (e.g., walking, kicking, coordinating full-body motions) and shows strong performance in both simulation and real-world tests. The study also outlines practical limitations and future directions toward universal, real-time, and fully embodied humanoid teleoperation.
Abstract
We present Human to Humanoid (H2O), a reinforcement learning (RL) based framework that enables real-time whole-body teleoperation of a full-sized humanoid robot with only an RGB camera. To create a large-scale retargeted motion dataset of human movements for humanoid robots, we propose a scalable "sim-to-data" process to filter and pick feasible motions using a privileged motion imitator. Afterwards, we train a robust real-time humanoid motion imitator in simulation using these refined motions and transfer it to the real humanoid robot in a zero-shot manner. We successfully achieve teleoperation of dynamic whole-body motions in real-world scenarios, including walking, back jumping, kicking, turning, waving, pushing, boxing, etc. To the best of our knowledge, this is the first demonstration to achieve learning-based real-time whole-body humanoid teleoperation.
