Table of Contents
Fetching ...

Sim-to-Real of Humanoid Locomotion Policies via Joint Torque Space Perturbation Injection

Woohyun Cha, Junhyeok Cha, Jaeyong Shin, Donghyeon Kim, Jaeheung Park

TL;DR

The paper tackles the sim-to-real challenge for humanoid locomotion by addressing the limitations of domain randomization. It introduces a perturbation-injection mechanism in joint torque space, where a neural network τ_φ generates state-dependent torque disturbances, randomized per episode, and added to the policy torque τ_π during forward simulation. Trained with PPO, AMP, and a gradient penalty, the method leverages privileged observations and a motion-imitation objective to learn stable, natural gaits for TOCABI while being robust to unseen actuator and contact dynamics. Experimental results in both simulation and the real robot demonstrate superior robustness to complex reality gaps compared with DR and random force injection baselines, with no loss in nominal task performance. The approach promises broader applicability to other high‑dimensional robotic systems by enabling more expressive modeling of unmodeled dynamics during training.

Abstract

This paper proposes a novel alternative to existing sim-to-real methods for training control policies with simulated experiences. Prior sim-to-real methods for legged robots mostly rely on the domain randomization approach, where a fixed finite set of simulation parameters is randomized during training. Instead, our method adds state-dependent perturbations to the input joint torque used for forward simulation during the training phase. These state-dependent perturbations are designed to simulate a broader range of reality gaps than those captured by randomizing a fixed set of simulation parameters. Experimental results show that our method enables humanoid locomotion policies that achieve greater robustness against complex reality gaps unseen in the training domain.

Sim-to-Real of Humanoid Locomotion Policies via Joint Torque Space Perturbation Injection

TL;DR

The paper tackles the sim-to-real challenge for humanoid locomotion by addressing the limitations of domain randomization. It introduces a perturbation-injection mechanism in joint torque space, where a neural network τ_φ generates state-dependent torque disturbances, randomized per episode, and added to the policy torque τ_π during forward simulation. Trained with PPO, AMP, and a gradient penalty, the method leverages privileged observations and a motion-imitation objective to learn stable, natural gaits for TOCABI while being robust to unseen actuator and contact dynamics. Experimental results in both simulation and the real robot demonstrate superior robustness to complex reality gaps compared with DR and random force injection baselines, with no loss in nominal task performance. The approach promises broader applicability to other high‑dimensional robotic systems by enabling more expressive modeling of unmodeled dynamics during training.

Abstract

This paper proposes a novel alternative to existing sim-to-real methods for training control policies with simulated experiences. Prior sim-to-real methods for legged robots mostly rely on the domain randomization approach, where a fixed finite set of simulation parameters is randomized during training. Instead, our method adds state-dependent perturbations to the input joint torque used for forward simulation during the training phase. These state-dependent perturbations are designed to simulate a broader range of reality gaps than those captured by randomizing a fixed set of simulation parameters. Experimental results show that our method enables humanoid locomotion policies that achieve greater robustness against complex reality gaps unseen in the training domain.

Paper Structure

This paper contains 21 sections, 13 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Control policies trained in simulation cannot be directly deployed to different environments due to the reality gap rising from the discrepancies between environments. Sim-to-real methods are used to enable sim-to-sim or sim-to-real transfer. This work proposes a novel sim-to-real method that injects state-dependent joint torque perturbations for enhanced robustness against complex reality gaps. The experimental results demonstrate that our approach outperforms existing methods in both simulation and real world environments.
  • Figure 2: Forward velocity (X), lateral velocity (Y), and heading command tracking performance when commanded to walk forward in 0.4m/s. No significant differences in command tracking performance are observed, which indicates that the enhanced robustness of our method does not come at the expense of task performance.
  • Figure 3: Forward velocity command tracking performances when the command velocity is 0.4m/s under the scenario where unseen actuator stiffness is introduced. All seeds of the DR baseline failed to produce proper gait patterns, and instead drastically leaned forward and lost balance. All seeds of our method and the ERFI baseline succeeded in producing proper gait patterns and following velocity commands.
  • Figure 4: Augmenting ground softness in Mujoco. Setting the timeconst in the solref parameter of the ground higher(right) in simulation causes objects to sink more into the ground before the reaction force is applied, which gives the impression of a softer ground and results in more contact penetration.
  • Figure 5: Forward velocity command tracking performances when the command velocity is 0.4m/s under the scenario where unseen ground contact dynamics is introduced. All seeds of the DR baseline and the ERFI baseline were not able to move forward without falling. All seeds of our method succeeded in walking forward without falling.
  • ...and 2 more figures