HumanPlus: Humanoid Shadowing and Imitation from Humans

Zipeng Fu; Qingqing Zhao; Qi Wu; Gordon Wetzstein; Chelsea Finn

HumanPlus: Humanoid Shadowing and Imitation from Humans

Zipeng Fu, Qingqing Zhao, Qi Wu, Gordon Wetzstein, Chelsea Finn

TL;DR

<3-5 sentence high-level summary> HumanPlus tackles the challenge of teaching humanoids from human data by integrating a real-time shadowing teleoperation pipeline with a perception-grounded imitation learner. The approach trains a low-level, task-agnostic policy in simulation and transfers it to hardware to shadow human motion with a single RGB camera, while collecting real-world data to train skill policies via the Humanoid Imitation Transformer on egocentric vision. The system demonstrates autonomous execution of diverse whole-body tasks with 60-100% success over up to 40 demonstrations, and outperforms baselines in teleoperation robustness and vision-informed imitation. The work also discusses hardware and perception limitations and outlines directions for broader skill coverage and more natural human-humanoid alignment.

Abstract

One of the key arguments for building robots that have similar form factors to human beings is that we can leverage the massive human data for training. Yet, doing so has remained challenging in practice due to the complexities in humanoid perception and control, lingering physical gaps between humanoids and humans in morphologies and actuation, and lack of a data pipeline for humanoids to learn autonomous skills from egocentric vision. In this paper, we introduce a full-stack system for humanoids to learn motion and autonomous skills from human data. We first train a low-level policy in simulation via reinforcement learning using existing 40-hour human motion datasets. This policy transfers to the real world and allows humanoid robots to follow human body and hand motion in real time using only a RGB camera, i.e. shadowing. Through shadowing, human operators can teleoperate humanoids to collect whole-body data for learning different tasks in the real world. Using the data collected, we then perform supervised behavior cloning to train skill policies using egocentric vision, allowing humanoids to complete different tasks autonomously by imitating human skills. We demonstrate the system on our customized 33-DoF 180cm humanoid, autonomously completing tasks such as wearing a shoe to stand up and walk, unloading objects from warehouse racks, folding a sweatshirt, rearranging objects, typing, and greeting another robot with 60-100% success rates using up to 40 demonstrations. Project website: https://humanoid-ai.github.io/

HumanPlus: Humanoid Shadowing and Imitation from Humans

TL;DR

Abstract

Paper Structure (19 sections, 6 figures, 5 tables)

This paper contains 19 sections, 6 figures, 5 tables.

Introduction
Related Work
Reinforcement Learning for Humanoids.
Teleoperation of Humanoids.
Robot Learning from Human Data.
HumanPlus Hardware
Human Body and Hand Data
Offline Human Data.
Retargeting.
Real-Time Body Pose Estimation and Retargeting.
Real-Time Hand Pose Estimation and Retargeting.
Shadowing of Human Motion
Imitation of Human Skills
Tasks
Experiments on Shadowing
...and 4 more sections

Figures (6)

Figure 1: Hardware Details. Our HumanPlus robot has two egocentric RGB cameras mounted on the head, two 6-DoF dexterous hands, and 33 degrees of freedom in total.
Figure 2: Shadowing and Retargeting. Our system uses one RGB camera for body and hand pose estimation.
Figure 3: Model Architectures. Our system consists of a decoder-only transformer for low-level control, Humanoid Shadowing Transformer, and a decoder-only transformer for imitation learning, Humanoid Imitation Transformer.
Figure 4: Task Definitions. We illustrate 5 autonomous tasks through imitation learning, and 5 shadowing tasks. Details are in Section \ref{['sec:tasks']}.
Figure 5: Baseline Teleoperation Systems.
...and 1 more figures

HumanPlus: Humanoid Shadowing and Imitation from Humans

TL;DR

Abstract

HumanPlus: Humanoid Shadowing and Imitation from Humans

Authors

TL;DR

Abstract

Table of Contents

Figures (6)