Table of Contents
Fetching ...

OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction

Lujie Yang, Xiaoyu Huang, Zhen Wu, Angjoo Kanazawa, Pieter Abbeel, Carmelo Sferrazza, C. Karen Liu, Rocky Duan, Guanya Shi

TL;DR

OmniRetarget tackles the critical data bottleneck in humanoid loco-manipulation by introducing an interaction-mesh–based, constraint-driven retargeting engine that preserves object–terrain interactions while minimizing Laplacian deformation. This yields high-quality, diverse trajectory references from a single demonstration, enabling proprioceptive RL with minimal rewards to achieve long-horizon tasks and successful zero-shot sim-to-real transfer on hardware. The approach is complemented by a broad augmentation strategy and open-source data, demonstrating superior kinematic fidelity and downstream learning performance against established baselines. Collectively, OmniRetarget shifts data-generation from reward Engineering to principled, interaction-aware reference generation, accelerating the development of capable humanoid policies.

Abstract

A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning (RL) policies. However, existing retargeting pipelines often struggle with the significant embodiment gap between humans and robots, producing physically implausible artifacts like foot-skating and penetration. More importantly, common retargeting methods neglect the rich human-object and human-environment interactions essential for expressive locomotion and loco-manipulation. To address this, we introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh that explicitly models and preserves the crucial spatial and contact relationships between an agent, the terrain, and manipulated objects. By minimizing the Laplacian deformation between the human and robot meshes while enforcing kinematic constraints, OmniRetarget generates kinematically feasible trajectories. Moreover, preserving task-relevant interactions enables efficient data augmentation, from a single demonstration to different robot embodiments, terrains, and object configurations. We comprehensively evaluate OmniRetarget by retargeting motions from OMOMO, LAFAN1, and our in-house MoCap datasets, generating over 8-hour trajectories that achieve better kinematic constraint satisfaction and contact preservation than widely used baselines. Such high-quality data enables proprioceptive RL policies to successfully execute long-horizon (up to 30 seconds) parkour and loco-manipulation skills on a Unitree G1 humanoid, trained with only 5 reward terms and simple domain randomization shared by all tasks, without any learning curriculum.

OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction

TL;DR

OmniRetarget tackles the critical data bottleneck in humanoid loco-manipulation by introducing an interaction-mesh–based, constraint-driven retargeting engine that preserves object–terrain interactions while minimizing Laplacian deformation. This yields high-quality, diverse trajectory references from a single demonstration, enabling proprioceptive RL with minimal rewards to achieve long-horizon tasks and successful zero-shot sim-to-real transfer on hardware. The approach is complemented by a broad augmentation strategy and open-source data, demonstrating superior kinematic fidelity and downstream learning performance against established baselines. Collectively, OmniRetarget shifts data-generation from reward Engineering to principled, interaction-aware reference generation, accelerating the development of capable humanoid policies.

Abstract

A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning (RL) policies. However, existing retargeting pipelines often struggle with the significant embodiment gap between humans and robots, producing physically implausible artifacts like foot-skating and penetration. More importantly, common retargeting methods neglect the rich human-object and human-environment interactions essential for expressive locomotion and loco-manipulation. To address this, we introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh that explicitly models and preserves the crucial spatial and contact relationships between an agent, the terrain, and manipulated objects. By minimizing the Laplacian deformation between the human and robot meshes while enforcing kinematic constraints, OmniRetarget generates kinematically feasible trajectories. Moreover, preserving task-relevant interactions enables efficient data augmentation, from a single demonstration to different robot embodiments, terrains, and object configurations. We comprehensively evaluate OmniRetarget by retargeting motions from OMOMO, LAFAN1, and our in-house MoCap datasets, generating over 8-hour trajectories that achieve better kinematic constraint satisfaction and contact preservation than widely used baselines. Such high-quality data enables proprioceptive RL policies to successfully execute long-horizon (up to 30 seconds) parkour and loco-manipulation skills on a Unitree G1 humanoid, trained with only 5 reward terms and simple domain randomization shared by all tasks, without any learning curriculum.

Paper Structure

This paper contains 33 sections, 16 equations, 9 figures, 3 tables, 5 algorithms.

Figures (9)

  • Figure 1: OmniRetarget overview. Human demonstrations are retargeted to the robot via interaction-mesh–based constrained optimization. Each spatial and shape augmentation is solved as a new optimization, producing diverse trajectories that serve as references for RL training with minimal reward design and domain randomization, enabling zero-shot transfer to real-world humanoids.
  • Figure 2: Cross-embodiment robot-object-terrain interaction.
  • Figure 3: OmniRetarget generates systematic variations of (a) terrain height, (b) object initial pose, and (c) object shape from a single human demonstration, with optimized motions in simulation (top) transferring consistently to hardware (bottom).
  • Figure 4: Additional hardware results showing diverse, agile and human-like behaviors.
  • Figure 5: Hardware results showing a high-dynamic wall-flip motion. The robot reaches a maximum linear velocity of $3.5$ m/s and a peak angular velocity of $15$ rad/s.
  • ...and 4 more figures