OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning

Tairan He; Zhengyi Luo; Xialin He; Wenli Xiao; Chong Zhang; Weinan Zhang; Kris Kitani; Changliu Liu; Guanya Shi

OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning

Tairan He, Zhengyi Luo, Xialin He, Wenli Xiao, Chong Zhang, Weinan Zhang, Kris Kitani, Changliu Liu, Guanya Shi

TL;DR

OmniH2O presents a unified, learning-based pipeline for universal, dexterous whole-body humanoid control that supports teleoperation via multiple interfaces and autonomous execution learned from demonstrations or frontier-model outputs. It introduces a kinematic-pose intermediary, a teacher-student sim-to-real distillation framework, and a new dataset (OmniH2O-6) for six tasks, enabling robust motion imitation and subsequent autonomous policies. The approach demonstrates strong real-world motion tracking, versatile human interfaces (VR, language, RGB), and viable autonomy through GPT-4o integration and diffusion-based imitation learning, highlighting practical potential for scalable humanoid teleoperation and learning. Limitations include reliance on accurate root odometry and safety guarantees, with future work directed at stairs, richer sensing, and more robust safety mechanisms.

Abstract

We present OmniH2O (Omni Human-to-Humanoid), a learning-based system for whole-body humanoid teleoperation and autonomy. Using kinematic pose as a universal control interface, OmniH2O enables various ways for a human to control a full-sized humanoid with dexterous hands, including using real-time teleoperation through VR headset, verbal instruction, and RGB camera. OmniH2O also enables full autonomy by learning from teleoperated demonstrations or integrating with frontier models such as GPT-4. OmniH2O demonstrates versatility and dexterity in various real-world whole-body tasks through teleoperation or autonomy, such as playing multiple sports, moving and manipulating objects, and interacting with humans. We develop an RL-based sim-to-real pipeline, which involves large-scale retargeting and augmentation of human motion datasets, learning a real-world deployable policy with sparse sensor input by imitating a privileged teacher policy, and reward designs to enhance robustness and stability. We release the first humanoid whole-body control dataset, OmniH2O-6, containing six everyday tasks, and demonstrate humanoid whole-body skill learning from teleoperated datasets.

OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning

TL;DR

Abstract

Paper Structure (29 sections, 11 figures, 18 tables)

This paper contains 29 sections, 11 figures, 18 tables.

Introduction
Related Works
Universal and Dexterous Human-to-Humanoid Whole-Body Control
Problem Formulation
Experimental Results
Whole-body Motion Tracking
Simulation Motion-Tracking Results
Real-world Motion-Tracking Results
Human Control via Universal Interfaces
Autonomy via Frontier Models or Imitation Learning
Limitations and Future Work
Real Robot System Setup
Simulation Baseline and Ablations
State Space Compositions
LfD Baselines
...and 14 more sections

Figures (11)

Figure 1: (a) OmniH2O enables teleoperating a full-size humanoid robot (Unitree H1) to complete tasks that require both high-precision manipulation and locomotion. (b) OmniH2O also enables full autonomy through visual input, controlled by GPT-4o or a policy learned from teleoperated demonstrations. Videos: see our website: https://omni.human2humanoid.com
Figure 2: (a) Source motion; (b) Retargeted motion; (c) Standing variant; (d) Squatting variant.
Figure 3: (a) OmniH2O retargets large-scale human motions and filters out infeasible motions for humanoids. (b) Our sim-to-real policy is distilled through supervised learning from an RL-trained teacher policy using privileged information. (c) The universal design of OmniH2O supports versatile human control interfaces including VR headset, RGB camera, language, etc. Our system also supports to be controlled by autonomous agents like GPT-4o or imitation learning policy trained using our dataset collected via teleoperation.
Figure 4: OmniH2O policy tracks motion goals from a language-based human motion generative model tevet2022human.
Figure 5: OmniH2O shows superior robustness against human strikes and different outdoor terrains.
...and 6 more figures

OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning

TL;DR

Abstract

OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (11)