Scalable and General Whole-Body Control for Cross-Humanoid Locomotion

Yufei Xue; YunFeng Lin; Wentao Dong; Yang Tang; Jingbo Wang; Jiangmiao Pang; Ming Zhou; Minghuan Liu; Weinan Zhang

Scalable and General Whole-Body Control for Cross-Humanoid Locomotion

Yufei Xue, YunFeng Lin, Wentao Dong, Yang Tang, Jingbo Wang, Jiangmiao Pang, Ming Zhou, Minghuan Liu, Weinan Zhang

TL;DR

XHugWBC tackles cross-humanoid whole-body control by learning a single generalist policy trained with physics-consistent morphological randomization and a universal, semantically aligned joint-space representation. The policy uses a GCN or Transformer encoder over an embodiment graph to fuse kinematic topology with proprioceptive history, supported by a state estimator and a node-wise action decoder to map to each robot's joints. In simulation, it generalizes to twelve morphologies and seven real humanoids, achieving approximately 85% of specialist performance on unseen robots and 100% survival in zero-shot real-world tests; fine-tuning further improves results. Real-world demonstrations include teleoperation-driven loco-manipulation across long-horizon tasks, indicating practical, scalable transfer without per-robot retraining.

Abstract

Learning-based whole-body controllers have become a key driver for humanoid robots, yet most existing approaches require robot-specific training. In this paper, we study the problem of cross-embodiment humanoid control and show that a single policy can robustly generalize across a wide range of humanoid robot designs with one-time training. We introduce XHugWBC, a novel cross-embodiment training framework that enables generalist humanoid control through: (1) physics-consistent morphological randomization, (2) semantically aligned observation and action spaces across diverse humanoid robots, and (3) effective policy architectures modeling morphological and dynamical properties. XHugWBC is not tied to any specific robot. Instead, it internalizes a broad distribution of morphological and dynamical characteristics during training. By learning motion priors from diverse randomized embodiments, the policy acquires a strong structural bias that supports zero-shot transfer to previously unseen robots. Experiments on twelve simulated humanoids and seven real-world robots demonstrate the strong generalization and robustness of the resulting universal controller.

Scalable and General Whole-Body Control for Cross-Humanoid Locomotion

TL;DR

Abstract

Paper Structure (35 sections, 4 theorems, 42 equations, 9 figures, 7 tables)

This paper contains 35 sections, 4 theorems, 42 equations, 9 figures, 7 tables.

Introduction
Related Work
Cross-Embodiment Learning for Legged Robots
Humanoid Whole-Body Control
Method
Physics-Consistent Morphological Randomization
Parameterizing a template robot.
Reparameterizing the link space.
Reparameterizing the joint space.
Universal Cross-Embodiment Representation
Joint space semantic alignment.
Graph-based morphology description.
Cross-Humanoid Learning
Observation.
GCN policy encoder.
...and 20 more sections

Key Result

Lemma 3.2

If $\mathbf{J} \succ 0$, then there exists a unique upper-triangular matrix $\mathbf{L}$ with positive diagonal entries such that

Figures (9)

Figure 1: Zero-shot generalization and real-world humanoid capabilities enabled by XHugWBC's generalist policy.First row: Robust zero-shot generalization across seven humanoids with diverse DoFs, dynamic characteristics, and morphological structures. Second row: flexible teleoperation using XHugWBC enables long-horizon whole-body loco-manipulation tasks. Website:https:xhugwbc.github.io.
Figure 2: Training framework of XHugWBC. (a) Data generation: physics-consistent morphological randomization produces diverse and physically meaningful embodiments. (b) Universal embodiment representation: robot-specific states are projected into a global joint space, upon which an embodiment graph is constructed. (c) Policy learning: the generalist policy uses a GCN- or Transformer-based encoder together with a state estimator. Deployment: the learned policy generalizes to seven humanoid robots with different kinematic, dynamic, and morphological structures in zero-shot.
Figure 3: Comparing training curves of the fine-tuned policies with the generalist policy and specialist policies.
Figure 4: t-SNE visualization of transformer latent representation. "-DoF" denotes the number of waist joints DoFs. The arrow ($\gets$) indicates the direction of increasing robot mass.
Figure 5: Zero-shot survival rate comparison across multiple baselines. All policies are trained under the same training protocol. Naive Random is trained on data generated by the naive morphological randomization method, whereas XHugWBC, MetaMorph, and MorAL are trained using randomization in Sec. \ref{['sec: interpretable morphological randomization']}.
...and 4 more figures

Theorems & Definitions (5)

Definition 3.1: Physics-Consistent Inertial Parameters
Lemma 3.2: Cholesky-Level Parameterization
Lemma 3.3: Affine Transformation of Pseudo-Inertia
Lemma 3.4: $\mathbb{R}^{10}$ Bijective Mapping of Inertia
Proposition 3.5: Smooth Physics-Consistent Randomization

Scalable and General Whole-Body Control for Cross-Humanoid Locomotion

TL;DR

Abstract

Scalable and General Whole-Body Control for Cross-Humanoid Locomotion

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (5)