Table of Contents
Fetching ...

H-Zero: Cross-Humanoid Locomotion Pretraining Enables Few-shot Novel Embodiment Transfer

Yunfeng Lin, Minghuan Liu, Yufei Xue, Ming Zhou, Yong Yu, Jiangmiao Pang, Weinan Zhang

TL;DR

H-Zero introduces a cross-embodiment locomotion pretraining framework that learns a unified base policy for humanoids by standardizing control semantics, diversifying embodied morphologies, and applying embodiment-aware learning. The pretrained policy demonstrates zero-shot and few-shot transfer to unseen robots, with efficient fine-tuning times and robust sim-to-real transfer. Key contributions include a hardware-agnostic joint representation, embodiment descriptors, and dynamic, per-embodiment training strategies that balance exploration and learning progress. The approach substantially improves transferability over single-embodiment training and offers a scalable path toward general humanoid locomotion across diverse platforms.

Abstract

The rapid advancement of humanoid robotics has intensified the need for robust and adaptable controllers to enable stable and efficient locomotion across diverse platforms. However, developing such controllers remains a significant challenge because existing solutions are tailored to specific robot designs, requiring extensive tuning of reward functions, physical parameters, and training hyperparameters for each embodiment. To address this challenge, we introduce H-Zero, a cross-humanoid locomotion pretraining pipeline that learns a generalizable humanoid base policy. We show that pretraining on a limited set of embodiments enables zero-shot and few-shot transfer to novel humanoid robots with minimal fine-tuning. Evaluations show that the pretrained policy maintains up to 81% of the full episode duration on unseen robots in simulation while enabling few-shot transfer to unseen humanoids and upright quadrupeds within 30 minutes of fine-tuning.

H-Zero: Cross-Humanoid Locomotion Pretraining Enables Few-shot Novel Embodiment Transfer

TL;DR

H-Zero introduces a cross-embodiment locomotion pretraining framework that learns a unified base policy for humanoids by standardizing control semantics, diversifying embodied morphologies, and applying embodiment-aware learning. The pretrained policy demonstrates zero-shot and few-shot transfer to unseen robots, with efficient fine-tuning times and robust sim-to-real transfer. Key contributions include a hardware-agnostic joint representation, embodiment descriptors, and dynamic, per-embodiment training strategies that balance exploration and learning progress. The approach substantially improves transferability over single-embodiment training and offers a scalable path toward general humanoid locomotion across diverse platforms.

Abstract

The rapid advancement of humanoid robotics has intensified the need for robust and adaptable controllers to enable stable and efficient locomotion across diverse platforms. However, developing such controllers remains a significant challenge because existing solutions are tailored to specific robot designs, requiring extensive tuning of reward functions, physical parameters, and training hyperparameters for each embodiment. To address this challenge, we introduce H-Zero, a cross-humanoid locomotion pretraining pipeline that learns a generalizable humanoid base policy. We show that pretraining on a limited set of embodiments enables zero-shot and few-shot transfer to novel humanoid robots with minimal fine-tuning. Evaluations show that the pretrained policy maintains up to 81% of the full episode duration on unseen robots in simulation while enabling few-shot transfer to unseen humanoids and upright quadrupeds within 30 minutes of fine-tuning.

Paper Structure

This paper contains 17 sections, 5 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Left: We propose a locomotion pretraining pipeline for humanoids by mixing multiple randomized embodiments into the training set. Middle: The pretrained policy shows moderate adaptability to unseen embodiments and real hardware. Right: Fine-tuning the pretrained policy achieves stable control on unseen robots with minimal additional training time.
  • Figure 2: Method overview.a) The policy is pretrained by learning on a diverse set of humanoid embodiments through multi-robot simulation with unified control. Training progress is dynamically balanced with embodiment-wise exploration and gradient updates. b) At deployment, the pretrained policy supports few-shot adaptation to novel robots.
  • Figure 3: t-SNE vandermaaten08a visualization of rollout trajectories and embodiment descriptors under different domain randomization (DR).Left: standard DR for single-robot training. Right: extended DR proposed in Sec. \ref{['sec:mix']} in cross-embodiment pretraining. With extended DR, trajectories from unseen robots without randomization overlap with the broadened training distribution, demonstrating improved transferability.
  • Figure 4: t-SNE convex hulls of embodiment parameters under standard (dashed) and quadrupled (solid) DR ranges, which retain clear boundaries even under strong randomization.
  • Figure 5: Mean episode length of cross-embodiment trainings (H1, N1, G1, and A1) under different training strategies.
  • ...and 2 more figures