Table of Contents
Fetching ...

Learning Velocity-based Humanoid Locomotion: Massively Parallel Learning with Brax and MJX

William Thibault, William Melek, Katja Mombaur

TL;DR

The paper tackles robust humanoid locomotion by learning a velocity-based policy for the REEM-C robot. It introduces a periodic reward formulation and trains the policy in the Brax-MJX framework with PPO, enabling massively parallel, GPU-accelerated learning. Key contributions include implementing the first velocity-based RL locomotion for a real biped in this simulator, achieving stable walking in simulation (e.g., at 1 m/s) and outlining concrete steps toward hardware verification using improved actuator models and TSID tracking to close the sim-to-real gap. The approach promises rapid, scalable training for versatile gait generation, with potential impact on real-world deployment of humanoid robots in dynamic environments.

Abstract

Humanoid locomotion is a key skill to bring humanoids out of the lab and into the real-world. Many motion generation methods for locomotion have been proposed including reinforcement learning (RL). RL locomotion policies offer great versatility and generalizability along with the ability to experience new knowledge to improve over time. This work presents a velocity-based RL locomotion policy for the REEM-C robot. The policy uses a periodic reward formulation and is implemented in Brax/MJX for fast training. Simulation results for the policy are demonstrated with future experimental results in progress.

Learning Velocity-based Humanoid Locomotion: Massively Parallel Learning with Brax and MJX

TL;DR

The paper tackles robust humanoid locomotion by learning a velocity-based policy for the REEM-C robot. It introduces a periodic reward formulation and trains the policy in the Brax-MJX framework with PPO, enabling massively parallel, GPU-accelerated learning. Key contributions include implementing the first velocity-based RL locomotion for a real biped in this simulator, achieving stable walking in simulation (e.g., at 1 m/s) and outlining concrete steps toward hardware verification using improved actuator models and TSID tracking to close the sim-to-real gap. The approach promises rapid, scalable training for versatile gait generation, with potential impact on real-world deployment of humanoid robots in dynamic environments.

Abstract

Humanoid locomotion is a key skill to bring humanoids out of the lab and into the real-world. Many motion generation methods for locomotion have been proposed including reinforcement learning (RL). RL locomotion policies offer great versatility and generalizability along with the ability to experience new knowledge to improve over time. This work presents a velocity-based RL locomotion policy for the REEM-C robot. The policy uses a periodic reward formulation and is implemented in Brax/MJX for fast training. Simulation results for the policy are demonstrated with future experimental results in progress.
Paper Structure (6 sections, 2 figures, 1 table)

This paper contains 6 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Stance and flight reward indicators depending on different gait phases
  • Figure 2: REEM-C walking forward at 1.0 m/s