Table of Contents
Fetching ...

DynaRetarget: Dynamically-Feasible Retargeting using Sampling-Based Trajectory Optimization

Victor Dhedin, Ilyass Taouil, Shafeef Omar, Dian Yu, Kun Tao, Angela Dai, Majid Khadiv

TL;DR

DynaRetarget tackles the challenge of converting human demonstrations into dynamically feasible humanoid loco-manipulation trajectories by combining IK-based retargeting with Sampling-Based Trajectory Optimization (SBTO) that incrementally grows the optimization horizon. This long-horizon refinement addresses the myopic and brittle behavior of prior SBMPC-based retargeting approaches, producing smoother, physically consistent trajectories that generalize across varied object properties. The refined trajectories are then used to train RL tracking policies with domain randomization, achieving robust zero-shot transfer to real humanoid hardware and enabling scalable synthetic data generation for loco-manipulation. The results show higher retargeting success rates than state-of-the-art baselines and clear benefits for downstream RL learning, highlighting the method’s practical impact for real-world robotics and data-driven policy development.

Abstract

In this paper, we introduce DynaRetarget, a complete pipeline for retargeting human motions to humanoid control policies. The core component of DynaRetarget is a novel Sampling-Based Trajectory Optimization (SBTO) framework that refines imperfect kinematic trajectories into dynamically feasible motions. SBTO incrementally advances the optimization horizon, enabling optimization over the entire trajectory for long-horizon tasks. We validate DynaRetarget by successfully retargeting hundreds of humanoid-object demonstrations and achieving higher success rates than the state of the art. The framework also generalizes across varying object properties, such as mass, size, and geometry, using the same tracking objective. This ability to robustly retarget diverse demonstrations opens the door to generating large-scale synthetic datasets of humanoid loco-manipulation trajectories, addressing a major bottleneck in real-world data collection.

DynaRetarget: Dynamically-Feasible Retargeting using Sampling-Based Trajectory Optimization

TL;DR

DynaRetarget tackles the challenge of converting human demonstrations into dynamically feasible humanoid loco-manipulation trajectories by combining IK-based retargeting with Sampling-Based Trajectory Optimization (SBTO) that incrementally grows the optimization horizon. This long-horizon refinement addresses the myopic and brittle behavior of prior SBMPC-based retargeting approaches, producing smoother, physically consistent trajectories that generalize across varied object properties. The refined trajectories are then used to train RL tracking policies with domain randomization, achieving robust zero-shot transfer to real humanoid hardware and enabling scalable synthetic data generation for loco-manipulation. The results show higher retargeting success rates than state-of-the-art baselines and clear benefits for downstream RL learning, highlighting the method’s practical impact for real-world robotics and data-driven policy development.

Abstract

In this paper, we introduce DynaRetarget, a complete pipeline for retargeting human motions to humanoid control policies. The core component of DynaRetarget is a novel Sampling-Based Trajectory Optimization (SBTO) framework that refines imperfect kinematic trajectories into dynamically feasible motions. SBTO incrementally advances the optimization horizon, enabling optimization over the entire trajectory for long-horizon tasks. We validate DynaRetarget by successfully retargeting hundreds of humanoid-object demonstrations and achieving higher success rates than the state of the art. The framework also generalizes across varying object properties, such as mass, size, and geometry, using the same tracking objective. This ability to robustly retarget diverse demonstrations opens the door to generating large-scale synthetic datasets of humanoid loco-manipulation trajectories, addressing a major bottleneck in real-world data collection.
Paper Structure (16 sections, 4 equations, 7 figures, 5 tables, 2 algorithms)

This paper contains 16 sections, 4 equations, 7 figures, 5 tables, 2 algorithms.

Figures (7)

  • Figure 1: Real-world humanoid loco-manipulation behaviors enabled by DynaRetarget. Demonstrations retargeted using our framework are physically consistent and zero-shot transferable to the real robot, enabling diverse contact-rich tasks involving interactions using feet and hands, such as kicking, lifting, pushing, and object handover.
  • Figure 2: DynaRetarget overview. Given a human–object demonstration, we first perform IK-based retargeting to obtain a kinematically-feasible robot–object demonstration. Due to morphological differences between the human and the robot, this process can produce imperfections, for instance missing contacts (red circle). To address these issues, we use the kinematic trajectory as a reference for SBTO, which refines the trajectory and ensures its physical consistency, including removing missing contacts (green circle). The motion is then used to train an RL tracking policy in simulation with domain randomization. Finally, the learned policy is transferred zero-shot to our humanoid robot in the real world.
  • Figure 3: Trajectory snapshots at $t^0 = 1$ s for the different baselines. Top row: SBTO, the box position error decreases across successive increments. Bottom row: FHTO with different horizon and SPIDER baseline. The reference is depicted in transparent.
  • Figure 4: Evolution of the object position error at time $t^0$ during the optimization. The object position error steadily decreases for about $200$ iterations with SBTO. This shows that the first knots are still being optimized even after $10$ increments of the horizon, which corresponds to an effective horizon of around $3.4$ s (see vertical and horizontal red lines). Other baselines fails as the position error remains too high.
  • Figure 5: Effective horizon of SBTO for a parameter sweep over $\sigma_{min}$ and $\alpha_\Sigma$, averaged over $3$ runs. The effective horizon increases column by column, as $\sigma_{min}$ increases, whereas it stays almost identical for different $\alpha_\Sigma$ values.
  • ...and 2 more figures