Table of Contents
Fetching ...

ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning

Siheng Zhao, Yanjie Ze, Yue Wang, C. Karen Liu, Pieter Abbeel, Guanya Shi, Rocky Duan

TL;DR

ResMimic, a two-stage residual learning framework for precise and expressive humanoid control from human motion data, is introduced and results show substantial gains in task success, training efficiency, and robustness over strong baselines.

Abstract

Humanoid whole-body loco-manipulation promises transformative capabilities for daily service and warehouse tasks. While recent advances in general motion tracking (GMT) have enabled humanoids to reproduce diverse human motions, these policies lack the precision and object awareness required for loco-manipulation. To this end, we introduce ResMimic, a two-stage residual learning framework for precise and expressive humanoid control from human motion data. First, a GMT policy, trained on large-scale human-only motion, serves as a task-agnostic base for generating human-like whole-body movements. An efficient but precise residual policy is then learned to refine the GMT outputs to improve locomotion and incorporate object interaction. To further facilitate efficient training, we design (i) a point-cloud-based object tracking reward for smoother optimization, (ii) a contact reward that encourages accurate humanoid body-object interactions, and (iii) a curriculum-based virtual object controller to stabilize early training. We evaluate ResMimic in both simulation and on a real Unitree G1 humanoid. Results show substantial gains in task success, training efficiency, and robustness over strong baselines. Videos are available at https://resmimic.github.io/ .

ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning

TL;DR

ResMimic, a two-stage residual learning framework for precise and expressive humanoid control from human motion data, is introduced and results show substantial gains in task success, training efficiency, and robustness over strong baselines.

Abstract

Humanoid whole-body loco-manipulation promises transformative capabilities for daily service and warehouse tasks. While recent advances in general motion tracking (GMT) have enabled humanoids to reproduce diverse human motions, these policies lack the precision and object awareness required for loco-manipulation. To this end, we introduce ResMimic, a two-stage residual learning framework for precise and expressive humanoid control from human motion data. First, a GMT policy, trained on large-scale human-only motion, serves as a task-agnostic base for generating human-like whole-body movements. An efficient but precise residual policy is then learned to refine the GMT outputs to improve locomotion and incorporate object interaction. To further facilitate efficient training, we design (i) a point-cloud-based object tracking reward for smoother optimization, (ii) a contact reward that encourages accurate humanoid body-object interactions, and (iii) a curriculum-based virtual object controller to stabilize early training. We evaluate ResMimic in both simulation and on a real Unitree G1 humanoid. Results show substantial gains in task success, training efficiency, and robustness over strong baselines. Videos are available at https://resmimic.github.io/ .

Paper Structure

This paper contains 26 sections, 4 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Visualization of imperfect humanoid–object interaction data caused by the embodiment gap during retargeting: (a) hand–chair penetration; (b) hand–box floating contact.
  • Figure 2: Overview of ResMimic : (1) A general motion tracking policy is trained on large-scale human motion data to serve as base policy. (2) A task-specific residual policy is efficiently trained with virtual force, object and contact reward, to refine the base policy outputs. (3) During real-world deployment, the combined policy is employed for loco-manipulation control.
  • Figure 3: We deploy ResMimic on Unitree G1 with MoCap-based object states. (a) Lifting a box from random object initial poses across 11 trials; (b) Autonomous consecutive kneeling and box lifting; (c) Reactive behavior to external perturbations.
  • Figure 4: Comparison between IsaacGym and MuJoCo results for task Chair (left) and Carry (right). Corresponding curves quantify object tracking error for Train from Scratch, Finetune, and ResMimic.
  • Figure 5: Real-world qualitative results comparing ResMimic against all other baselines.
  • ...and 2 more figures