Benchmarking Population-Based Reinforcement Learning across Robotic Tasks with GPU-Accelerated Simulation
Asad Ali Shahid, Yashraj Narang, Vincenzo Petrone, Enrico Ferrentino, Ankur Handa, Dieter Fox, Marco Pavone, Loris Roveda
TL;DR
This work addresses the data inefficiency of deep reinforcement learning in robotics by combining GPU-accelerated simulation with population-based training to enhance exploration and adapt hyperparameters online. It systematically benchmarks Population-Based Reinforcement Learning (PBRL) against PPO, SAC, and DDPG across four Isaac Gym tasks, and demonstrates a sim-to-real transfer by deploying a PBRL policy on a Franka Panda without additional adaptation. The results show that PBRL often yields higher final rewards and faster convergence, with performance gains varying by task and algorithm; the real-world deployment further validates the approach. The authors release an open-source codebase to enable broader exploration of PBRL in challenging robotic manipulation tasks, highlighting the practical impact for scalable, robust learning in robotics.
Abstract
In recent years, deep reinforcement learning (RL) has shown its effectiveness in solving complex continuous control tasks. However, this comes at the cost of an enormous amount of experience required for training, exacerbated by the sensitivity of learning efficiency and the policy performance to hyperparameter selection, which often requires numerous trials of time-consuming experiments. This work leverages a Population-Based Reinforcement Learning (PBRL) approach and a GPU-accelerated physics simulator to enhance the exploration capabilities of RL by concurrently training multiple policies in parallel. The PBRL framework is benchmarked against three state-of-the-art RL algorithms -- PPO, SAC, and DDPG -- dynamically adjusting hyperparameters based on the performance of learning agents. The experiments are performed on four challenging tasks in Isaac Gym -- Anymal Terrain, Shadow Hand, Humanoid, Franka Nut Pick -- by analyzing the effect of population size and mutation mechanisms for hyperparameters. The results show that PBRL agents achieve superior performance, in terms of cumulative reward, compared to non-evolutionary baseline agents. Moreover, the trained agents are finally deployed in the real world for a Franka Nut Pick task. To our knowledge, this is the first sim-to-real attempt for deploying PBRL agents on real hardware. Code and videos of the learned policies are available on our project website (https://sites.google.com/view/pbrl).
