Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning
Nikita Rudin, David Hoeller, Philipp Reist, Marco Hutter
TL;DR
This work tackles the long training times of deep reinforcement learning for legged locomotion by introducing a GPU-based, massively parallel training pipeline using Isaac Gym. It combines an on-policy PPO algorithm with a game-inspired automatic curriculum to train thousands of simulated robots simultaneously, achieving sub-4-minute training on flat terrain and ~20 minutes on uneven terrain for the ANYmal quadruped, with successful sim-to-real transfer. Key contributions include a thorough analysis of parallelism effects, hyper-parameter adaptations, and a robust curriculum that scales to diverse terrains, plus open-source code to accelerate further research. The results demonstrate rapid, repeatable policy generation that generalizes across robot variants and facilitates deployment on real hardware, signaling a shift toward faster real-world DRL development for legged locomotion.
Abstract
In this work, we present and study a training set-up that achieves fast policy generation for real-world robotic tasks by using massive parallelism on a single workstation GPU. We analyze and discuss the impact of different training algorithm components in the massively parallel regime on the final policy performance and training times. In addition, we present a novel game-inspired curriculum that is well suited for training with thousands of simulated robots in parallel. We evaluate the approach by training the quadrupedal robot ANYmal to walk on challenging terrain. The parallel approach allows training policies for flat terrain in under four minutes, and in twenty minutes for uneven terrain. This represents a speedup of multiple orders of magnitude compared to previous work. Finally, we transfer the policies to the real robot to validate the approach. We open-source our training code to help accelerate further research in the field of learned legged locomotion.
