HumanMimic: Learning Natural Locomotion and Transitions for Humanoid Robot via Wasserstein Adversarial Imitation

Annan Tang; Takuma Hiraoka; Naoki Hiraoka; Fan Shi; Kento Kawaharazuka; Kunio Kojima; Kei Okada; Masayuki Inaba

HumanMimic: Learning Natural Locomotion and Transitions for Humanoid Robot via Wasserstein Adversarial Imitation

Annan Tang, Takuma Hiraoka, Naoki Hiraoka, Fan Shi, Kento Kawaharazuka, Kunio Kojima, Kei Okada, Masayuki Inaba

TL;DR

The paper introduces a Wasserstein adversarial imitation framework with a soft boundary constraint to stabilize training for humanoid locomotion. It couples a unified primitive-skeleton motion retargeting pipeline with velocity-conditioned RL, enabling the full-sized humanoid JAXON to imitate diverse human locomotion and achieve seamless transitions as velocity commands change. Key contributions include the soft-boundary Wasserstein-1 critic, a practical motion retargeting method, and demonstrations of natural gait patterns and transitions in high-fidelity simulation, plus sim-to-sim transfer readiness for real-world deployment. The work has potential to reduce reward engineering and improve robustness of humanoid locomotion in real-world scenarios by addressing mode collapse and cross-morphology transfer challenges.

Abstract

Transferring human motion skills to humanoid robots remains a significant challenge. In this study, we introduce a Wasserstein adversarial imitation learning system, allowing humanoid robots to replicate natural whole-body locomotion patterns and execute seamless transitions by mimicking human motions. First, we present a unified primitive-skeleton motion retargeting to mitigate morphological differences between arbitrary human demonstrators and humanoid robots. An adversarial critic component is integrated with Reinforcement Learning (RL) to guide the control policy to produce behaviors aligned with the data distribution of mixed reference motions. Additionally, we employ a specific Integral Probabilistic Metric (IPM), namely the Wasserstein-1 distance with a novel soft boundary constraint to stabilize the training process and prevent mode collapse. Our system is evaluated on a full-sized humanoid JAXON in the simulator. The resulting control policy demonstrates a wide range of locomotion patterns, including standing, push-recovery, squat walking, human-like straight-leg walking, and dynamic running. Notably, even in the absence of transition motions in the demonstration dataset, robots showcase an emerging ability to transit naturally between distinct locomotion patterns as desired speed changes.

HumanMimic: Learning Natural Locomotion and Transitions for Humanoid Robot via Wasserstein Adversarial Imitation

TL;DR

Abstract

Paper Structure (10 sections, 9 equations, 7 figures, 1 table)

This paper contains 10 sections, 9 equations, 7 figures, 1 table.

Introduction
Related Works
Motion Retargeting
Wasserstein Adversarial Imitation
Velocity-Conditioned Reinforcement Learning
Wasserstein Critic
Experiment
Implementation Details
Evaluation
Conclusion

Figures (7)

Figure 1: Our Wasserstein adversarial imitation learning system enables a full-sized humanoid to exhibit various human-like natural locomotion behaviors and achieve seamless transitions as velocity command changes.
Figure 2: Binding the humanoid JAXON and the MoCap skeleton involves merging their bones into a common primitive skeleton.
Figure 3: Wasserstein Adversarial Imitation Framework. Given the robot's proprioceptive state and base velocity commands, the policy network predicts the joint position targets. A PD controller converts these targets into torques to actuate the robot. Using the reference motion dataset and policy-generated motion dataset, the Wasserstein critic updates its parameters through the soft-boundary Wasserstein-1 loss during training and predicts the style reward during roll-out. The style reward $r^S$ is combined with the velocity reward $r^V$ to guide policy training.
Figure 4: Snapshots of various natural locomotion behaviors learned by the Humanoid JAXON. As the velocity command increases from 0 m/s to 5 m/s, the robot exhibits seamless transitions from standing to dynamic running.
Figure 5: Top: the velocity tracking curve, where the velocity command increases from 0.0 m/s to 5.0 m/s with a constant acceleration of 0.1 $\text{m/s}^2$. Middle and bottom: the contact forces in the z-direction for the left and right feet during the transition from standing to running.
...and 2 more figures

HumanMimic: Learning Natural Locomotion and Transitions for Humanoid Robot via Wasserstein Adversarial Imitation

TL;DR

Abstract

HumanMimic: Learning Natural Locomotion and Transitions for Humanoid Robot via Wasserstein Adversarial Imitation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)