Opt2Skill: Imitating Dynamically-feasible Whole-Body Trajectories for Versatile Humanoid Loco-Manipulation

Fukang Liu; Zhaoyuan Gu; Yilin Cai; Ziyi Zhou; Hyunyoung Jung; Jaehwi Jang; Shijie Zhao; Sehoon Ha; Yue Chen; Danfei Xu; Ye Zhao

Opt2Skill: Imitating Dynamically-feasible Whole-Body Trajectories for Versatile Humanoid Loco-Manipulation

Fukang Liu, Zhaoyuan Gu, Yilin Cai, Ziyi Zhou, Hyunyoung Jung, Jaehwi Jang, Shijie Zhao, Sehoon Ha, Yue Chen, Danfei Xu, Ye Zhao

TL;DR

Opt2Skill presents a TO-guided RL framework that generates dynamically feasible, torque-rich whole-body trajectories using full-order dynamics and DDP, then trains RL policies to imitate and track these references on a Digit humanoid. The method achieves superior tracking accuracy, higher task success, and robust sim-to-real transfer, outperforming human and IK-based baselines especially in contact-rich and rough-terrain tasks. A key finding is that incorporating torque and contact-force information from TO references significantly enhances force tracking and stability in loco-manipulation. Hardware experiments validate the approach across diverse tasks, demonstrating practical viability for versatile humanoid loco-manipulation.

Abstract

Humanoid robots are designed to perform diverse loco-manipulation tasks. However, they face challenges due to their high-dimensional and unstable dynamics, as well as the complex contact-rich nature of the tasks. Model-based optimal control methods offer flexibility to define precise motion but are limited by high computational complexity and accurate contact sensing. On the other hand, reinforcement learning (RL) handles high-dimensional spaces with strong robustness but suffers from inefficient learning, unnatural motion, and sim-to-real gaps. To address these challenges, we introduce Opt2Skill, an end-to-end pipeline that combines model-based trajectory optimization with RL to achieve robust whole-body loco-manipulation. Opt2Skill generates dynamic feasible and contact-consistent reference motions for the Digit humanoid robot using differential dynamic programming (DDP) and trains RL policies to track these optimal trajectories. Our results demonstrate that Opt2Skill outperforms baselines that rely on human demonstrations and inverse kinematics-based references, both in motion tracking and task success rates. Furthermore, we show that incorporating trajectories with torque information improves contact force tracking in contact-involved tasks, such as wiping a table. We have successfully transferred our approach to real-world applications.

Opt2Skill: Imitating Dynamically-feasible Whole-Body Trajectories for Versatile Humanoid Loco-Manipulation

TL;DR

Abstract

Paper Structure (21 sections, 2 equations, 6 figures, 4 tables)

This paper contains 21 sections, 2 equations, 6 figures, 4 tables.

Introduction
Related Work
Model-Based Trajectory Optimization for Humanoids
Reinforcement Learning for Humanoids
Methods
Whole-Body Trajectory Optimization
RL-Based Imitation of Dynamically Feasible Trajectories
Problem Formulation
Imitation Policy
Reward Functions, Domain Randomization, and Curriculum
Results
Robot System
Comparison with Different Datasets
Dataset Generation
Evaluation Metrics
...and 6 more sections

Figures (6)

Figure 1: The proposed Opt2Skill framework enables a Digit humanoid robot to perform various loco-manipulation tasks by mimicking optimal model-based reference trajectories in real-world scenarios.
Figure 2: Overall structure of the Opt2Skill framework. (a) We first generate structured, dynamically feasible reference trajectories using trajectory optimization with contact constraints, torque limits, and task-specific objectives. (b) Each trajectory contains joint angles, joint velocities, body position, orientation, linear and angular velocities, and dynamic-relevant quantities such as joint torques and interaction forces. (c) These trajectories serve as supervision signals to train RL policies that predict joint-level targets, tracked by a low-level PD controller. The resulting policies internalize control strategies grounded in model-based optimization while remaining reactive and robust to disturbances, sensor noise, and dynamics variability, enabling direct deployment to real hardware.
Figure 3: Success rate comparison across different terrains. Policies trained on human-retargeted, IK-based, and TO-generated trajectories (with matched parameters) are evaluated on stair heights (left) and slope angles (right).
Figure 4: Contact force profiles across all four policies under varying reference force levels. Each subplot shows $10$ trajectories per policy.
Figure 5: Snapshots of hardware experiments with data plots. (a) Flat-ground locomotion with accurate base position tracking. (b-c) Walking up a stair and a ramp ($\sim19.5^\circ$) with foot $z$-position plots indicating elevation. (d) Desk object reaching, with plots of end-effector tracking. (e) Box pick-up from one shelf layer to another. Note that in (d) and (e), end-effector positions are expressed relative to the root. Dashed lines indicate the reference trajectories.
...and 1 more figures

Opt2Skill: Imitating Dynamically-feasible Whole-Body Trajectories for Versatile Humanoid Loco-Manipulation

TL;DR

Abstract

Opt2Skill: Imitating Dynamically-feasible Whole-Body Trajectories for Versatile Humanoid Loco-Manipulation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)