Table of Contents
Fetching ...

PolySim: Bridging the Sim-to-Real Gap for Humanoid Control via Multi-Simulator Dynamics Randomization

Zixing Lei, Zibo Zhou, Sheng Yin, Yueru Chen, Qingyao Xu, Weixin Li, Yunhong Wang, Bowei Tang, Wei Jing, Siheng Chen

TL;DR

PolySim tackles the sim-to-real gap in humanoid whole-body control by training policies across multiple heterogeneous simulators, enabling dynamics-level domain randomization. It introduces training–simulation isolation, a Simulator Router for unified interfaces, and GPU-direct communication to support parallel, high-throughput rollouts. Theoretical results show a tighter upper bound on simulator bias when mixing dynamics, and empirical results demonstrate improved sim-to-sim generalization and zero-shot real transfer to a Unitree G1. This approach reduces reliance on real-world data and offers a scalable path toward robust, generalizable humanoid control.

Abstract

Humanoid whole-body control (WBC) policies trained in simulation often suffer from the sim-to-real gap, which fundamentally arises from simulator inductive bias, the inherent assumptions and limitations of any single simulator. These biases lead to nontrivial discrepancies both across simulators and between simulation and the real world. To mitigate the effect of simulator inductive bias, the key idea is to train policies jointly across multiple simulators, encouraging the learned controller to capture dynamics that generalize beyond any single simulator's assumptions. We thus introduce PolySim, a WBC training platform that integrates multiple heterogeneous simulators. PolySim can launch parallel environments from different engines simultaneously within a single training run, thereby realizing dynamics-level domain randomization. Theoretically, we show that PolySim yields a tighter upper bound on simulator inductive bias than single-simulator training. In experiments, PolySim substantially reduces motion-tracking error in sim-to-sim evaluations; for example, on MuJoCo, it improves execution success by 52.8 over an IsaacSim baseline. PolySim further enables zero-shot deployment on a real Unitree G1 without additional fine-tuning, showing effective transfer from simulation to the real world. We will release the PolySim code upon acceptance of this work.

PolySim: Bridging the Sim-to-Real Gap for Humanoid Control via Multi-Simulator Dynamics Randomization

TL;DR

PolySim tackles the sim-to-real gap in humanoid whole-body control by training policies across multiple heterogeneous simulators, enabling dynamics-level domain randomization. It introduces training–simulation isolation, a Simulator Router for unified interfaces, and GPU-direct communication to support parallel, high-throughput rollouts. Theoretical results show a tighter upper bound on simulator bias when mixing dynamics, and empirical results demonstrate improved sim-to-sim generalization and zero-shot real transfer to a Unitree G1. This approach reduces reliance on real-world data and offers a scalable path toward robust, generalizable humanoid control.

Abstract

Humanoid whole-body control (WBC) policies trained in simulation often suffer from the sim-to-real gap, which fundamentally arises from simulator inductive bias, the inherent assumptions and limitations of any single simulator. These biases lead to nontrivial discrepancies both across simulators and between simulation and the real world. To mitigate the effect of simulator inductive bias, the key idea is to train policies jointly across multiple simulators, encouraging the learned controller to capture dynamics that generalize beyond any single simulator's assumptions. We thus introduce PolySim, a WBC training platform that integrates multiple heterogeneous simulators. PolySim can launch parallel environments from different engines simultaneously within a single training run, thereby realizing dynamics-level domain randomization. Theoretically, we show that PolySim yields a tighter upper bound on simulator inductive bias than single-simulator training. In experiments, PolySim substantially reduces motion-tracking error in sim-to-sim evaluations; for example, on MuJoCo, it improves execution success by 52.8 over an IsaacSim baseline. PolySim further enables zero-shot deployment on a real Unitree G1 without additional fine-tuning, showing effective transfer from simulation to the real world. We will release the PolySim code upon acceptance of this work.

Paper Structure

This paper contains 25 sections, 2 theorems, 11 equations, 5 figures, 3 tables.

Key Result

Lemma 1

The sim-to-real gap under a policy class $\Pi$ is upper-bounded as where $\gamma\in(0,1)$ is the discount factor, $L_V$ is the Lipschitz constant of $V^{\pi,0}$ and $\Delta_i:=\sup_{s,a}W_1\!(T_0(\cdot|s,a),\tilde{T}_i(\cdot|s,a))$

Figures (5)

  • Figure 1: PolySim, a parallel training framework, achieves whole-body agility on the Unitree G1 humanoid. Training across diverse simulators reduces motion-tracking error in sim-to-sim transfer and enables zero-shot deployment on the real world.
  • Figure 2: Visual illustration of PolySim. The pink star denotes real-world dynamics. Filled squares indicate the nominal transition dynamics of each simulator, representing its inherent inductive bias as an approximation of real-world dynamics. Hollow circles depict domain-randomized variants that perturb parameters but remain centered around their respective simulators. Training against mixtures of simulators (PolySim, purple dot) combines multiple approximations of real-world dynamics, allowing the resulting policy to lie closer to the true dynamics than any single simulator or its parameter-level domain randomization, thereby reducing the notorious sim-to-real gap.
  • Figure 3: System overview of the proposed parallel multi-simulator RL framework (Mode III). Left (Training Framework): a simulator-agnostic RL loop where a unified training configuration (scene/agent/task) drives observation and reward computation; the policy network produces actions and is updated by the optimizer. Right (Simulation): heterogeneous engines (IsaacGym/IsaacSim/Genesis/MuJoCo) are virtualized behind a Simulator Router that performs physics harmonization, API translation, and numerical normalization. The router maps the unified initialization config to engine-specific settings, dispatches actions, and returns physical variables for observation/reward calculation. Green paths indicate GPU-direct links (PyTorch RPC/NCCL over NVLink/PCIe), enabling concurrent rollouts across devices and stable, high-throughput training.
  • Figure 4: Success rate on seen and unseen simulators under different settings. 'single' indicates training on a single simulator; 'n-serial' indicates sequential training on n simulators.
  • Figure 5: Visualization of sim-to-sim performance in MuJoCo across different training methods. The five panels show the humanoid performing a forward jumping motion under single-simulator, sequential multi-simulator, and PolySim multi-simulator training. Only the PolySim policy trained in parallel across simulators successfully executes the motion in MuJoCo.

Theorems & Definitions (4)

  • Definition 1: Sim-to-real gap of a simulator
  • Lemma 1: Upper-bound of the sim-to-real gap
  • Definition 2: Sim-to-real gap of PolySim
  • Theorem 1: Superiority of the PolySim S2R Gap