Table of Contents
Fetching ...

HALO:Closing Sim-to-Real Gap for Heavy-loaded Humanoid Agile Motion Skills via Differentiable Simulation

Xingyi Wang, Chenyun Zhang, Weiji Xie, Chao Yu, Wei Song, Chenjia Bai, Shiqiang Zhu

Abstract

Humanoid robots deployed in real-world scenarios often need to carry unknown payloads, which introduce significant mismatch and degrade the effectiveness of simulation-to-reality reinforcement learning methods. To address this challenge, we propose a two-stage gradient-based system identification framework built on the differentiable simulator MuJoCo XLA. The first stage calibrates the nominal robot model using real-world data to reduce intrinsic sim-to-real discrepancies, while the second stage further identifies the mass distribution of the unknown payload. By explicitly reducing structured model bias prior to policy training, our approach enables zero-shot transfer of reinforcement learning policies to hardware under heavy-load conditions. Extensive simulation and real-world experiments demonstrate more precise parameter identification, improved motion tracking accuracy, and substantially enhanced agility and robustness compared to existing baselines. Project Page: https://mwondering.github.io/halo-humanoid/

HALO:Closing Sim-to-Real Gap for Heavy-loaded Humanoid Agile Motion Skills via Differentiable Simulation

Abstract

Humanoid robots deployed in real-world scenarios often need to carry unknown payloads, which introduce significant mismatch and degrade the effectiveness of simulation-to-reality reinforcement learning methods. To address this challenge, we propose a two-stage gradient-based system identification framework built on the differentiable simulator MuJoCo XLA. The first stage calibrates the nominal robot model using real-world data to reduce intrinsic sim-to-real discrepancies, while the second stage further identifies the mass distribution of the unknown payload. By explicitly reducing structured model bias prior to policy training, our approach enables zero-shot transfer of reinforcement learning policies to hardware under heavy-load conditions. Extensive simulation and real-world experiments demonstrate more precise parameter identification, improved motion tracking accuracy, and substantially enhanced agility and robustness compared to existing baselines. Project Page: https://mwondering.github.io/halo-humanoid/
Paper Structure (24 sections, 10 equations, 5 figures, 6 tables)

This paper contains 24 sections, 10 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Performance of HALO in heavy-loaded scenarios.(a) The payload settings used by HALO. (b, c) Compared to DR, HALO significantly improves the accuracy of straight-line bidirectional walking. (d) Data collection under payload conditions with a single-foot constraint. (e) HALO enables challenging humanoid locomotion skills.
  • Figure 2: Overview of HALO.(a) Data Collection: Trajectories are collected under both loaded and unloaded conditions using exploration policy trained with wide DR, followed by real-world deployment with a fixed foot constraint. (b) Data Processing: Full-body trajectories reconstruction from joint-state measurements via forward kinematics and foot-height alignment. (c) Two-stage Payload-related Parameter Identification: Stage 1 optimize the full set of model parameters to yield a calibrated base model using trajectories without payload. Based on the calibrated model, stage 2 optimize only the payload-related parameters, using trajectories collected under loaded conditions. (d) Heavy-loaded Motion Skills: The accurate identified model parameters enabling zero-shot sim-to-real transfer of the learned skills to the physical heavy-loaded humanoid.
  • Figure 3: Inconsistent left-right foot heights caused by sensor noise.(a) In the original trajectory, the right foot position calculated via forward kinematics remains suspended in the air. (b) In the optimized trajectory, the right foot maintains physically consistent ground contact.
  • Figure 4: Comparison of convergence performance between two-stage and one-stage methods. The proposed Two-Stage method HALO (orange) demonstrates superior accuracy, converging significantly closer to the estimated reference values (red dashed lines) compared to the One-Stage baseline (blue).
  • Figure 5: Experiment settings and results of scenario 1. The metrics we defined are shown in the image, HALO outperforms the baselines across all metrics.