Humanoid Policy ~ Human Policy
Ri-Zhao Qiu, Shiqi Yang, Xuxin Cheng, Chaitanya Chawla, Jialong Li, Tairan He, Ge Yan, David J. Yoon, Ryan Hoque, Lars Paulsen, Ge Yang, Jian Zhang, Sha Yi, Guanya Shi, Xiaolong Wang
TL;DR
Humanoid Policy ~ Human Policy tackles the costly bottleneck of collecting robot demonstrations by leveraging egocentric human data. It introduces PH^2D, a task-oriented egocentric dataset with accurate 3D hand/finger poses, and HAT, a transformer-based policy that unifies human and humanoid state-action spaces and retargets actions differentiably. Empirical results show that co-training with human data substantially boosts out-of-distribution generalization and data efficiency, enabling robust cross-embodiment manipulation across different humanoids. This work demonstrates a scalable path to open-ended humanoid manipulation by treating humans as a rich data source for cross-embodiment learning.
Abstract
Training manipulation policies for humanoid robots with diverse data enhances their robustness and generalization across tasks and platforms. However, learning solely from robot demonstrations is labor-intensive, requiring expensive tele-operated data collection which is difficult to scale. This paper investigates a more scalable data source, egocentric human demonstrations, to serve as cross-embodiment training data for robot learning. We mitigate the embodiment gap between humanoids and humans from both the data and modeling perspectives. We collect an egocentric task-oriented dataset (PH2D) that is directly aligned with humanoid manipulation demonstrations. We then train a human-humanoid behavior policy, which we term Human Action Transformer (HAT). The state-action space of HAT is unified for both humans and humanoid robots and can be differentiably retargeted to robot actions. Co-trained with smaller-scale robot data, HAT directly models humanoid robots and humans as different embodiments without additional supervision. We show that human data improves both generalization and robustness of HAT with significantly better data collection efficiency. Code and data: https://human-as-robot.github.io/
