Table of Contents
Fetching ...

One-shot Humanoid Whole-body Motion Learning

Hao Huang, Geeta Chandra Raju Bethala, Shuaihang Yuan, Congcong Wen, Anthony Tzes, Yi Fang

TL;DR

This work tackles the data bottleneck in learning expressive humanoid whole-body motion by showing that a policy can be learned with only a single non-walking target sample supplemented by numerous walking motions. It introduces a one-shot pipeline that uses order-preserving optimal transport to align walking sequences with the target, interpolates along geodesics to generate intermediate poses, and enforces collision-free configurations before retargeting to a humanoid for RL training in simulation. Key contributions include the OPOT-based sequence alignment, geodesic pose sampling, and differentiable collision-aware optimization on the pose-skeleton manifold, enabling effective policy learning without training neural networks for motion generation. The approach demonstrates superior performance on CMU MoCap benchmarks in sim-to-sim transfer and reduces data collection burden, offering a practical path to data-efficient, expressive humanoid control.

Abstract

Whole-body humanoid motion represents a cornerstone challenge in robotics, integrating balance, coordination, and adaptability to enable human-like behaviors. However, existing methods typically require multiple training samples per motion category, rendering the collection of high-quality human motion datasets both labor-intensive and costly. To address this, we propose a novel approach that trains effective humanoid motion policies using only a single non-walking target motion sample alongside readily available walking motions. The core idea lies in leveraging order-preserving optimal transport to compute distances between walking and non-walking sequences, followed by interpolation along geodesics to generate new intermediate pose skeletons, which are then optimized for collision-free configurations and retargeted to the humanoid before integration into a simulated environment for policy training via reinforcement learning. Experimental evaluations on the CMU MoCap dataset demonstrate that our method consistently outperforms baselines, achieving superior performance across metrics. Code will be released upon acceptance.

One-shot Humanoid Whole-body Motion Learning

TL;DR

This work tackles the data bottleneck in learning expressive humanoid whole-body motion by showing that a policy can be learned with only a single non-walking target sample supplemented by numerous walking motions. It introduces a one-shot pipeline that uses order-preserving optimal transport to align walking sequences with the target, interpolates along geodesics to generate intermediate poses, and enforces collision-free configurations before retargeting to a humanoid for RL training in simulation. Key contributions include the OPOT-based sequence alignment, geodesic pose sampling, and differentiable collision-aware optimization on the pose-skeleton manifold, enabling effective policy learning without training neural networks for motion generation. The approach demonstrates superior performance on CMU MoCap benchmarks in sim-to-sim transfer and reduces data collection burden, offering a practical path to data-efficient, expressive humanoid control.

Abstract

Whole-body humanoid motion represents a cornerstone challenge in robotics, integrating balance, coordination, and adaptability to enable human-like behaviors. However, existing methods typically require multiple training samples per motion category, rendering the collection of high-quality human motion datasets both labor-intensive and costly. To address this, we propose a novel approach that trains effective humanoid motion policies using only a single non-walking target motion sample alongside readily available walking motions. The core idea lies in leveraging order-preserving optimal transport to compute distances between walking and non-walking sequences, followed by interpolation along geodesics to generate new intermediate pose skeletons, which are then optimized for collision-free configurations and retargeted to the humanoid before integration into a simulated environment for policy training via reinforcement learning. Experimental evaluations on the CMU MoCap dataset demonstrate that our method consistently outperforms baselines, achieving superior performance across metrics. Code will be released upon acceptance.

Paper Structure

This paper contains 17 sections, 21 equations, 2 figures, 5 tables, 1 algorithm.

Figures (2)

  • Figure 1: Given a sequence of walking motion pose skeletons and a target sequence comprising non-walking motions, we employ order-preserving optimal transport (OPOT) to compute the distance between these two sequences. Subsequently, we interpolate and sample novel pose skeletons along the geodesics connecting the walking and non-walking sequences. These sampled skeletons, together with the target sequence, are then retargeted to a humanoid robot and integrated into a simulated environment for training a whole-body motion policy via the Proximal Policy Optimization (PPO) algorithm.
  • Figure 2: Illustration of spheres for joints, capsules for bones, and line segment distance between two bones.