Table of Contents
Fetching ...

RobocupGym: A challenging continuous control benchmark in Robocup

Michael Beukman, Branden Ingram, Geraud Nangue Tasse, Benjamin Rosman, Pravesh Ranchod

TL;DR

This work introduces RobocupGym, an open-source library that links the Simspark-based Robocup 3D simulator with Stable Baselines 3 to create continuous-control RL benchmarks in robotic soccer. It provides premade tasks (SimpleKick and VelocityKick), a modular architecture for adding new tasks, and a Gymnasium-compatible interface to enable standard RL workflows and parallel training. Initial results show PPO and SAC can learn kicking behaviors, with PPO often outperforming SAC and parallelism reducing training time. By delivering a realistic, extendable robotics benchmark, RobocupGym enables practical RL research in high-dimensional, real-world-like control and paves the way for more sophisticated multi-task and multi-agent RL in robotic football.

Abstract

Reinforcement learning (RL) has progressed substantially over the past decade, with much of this progress being driven by benchmarks. Many benchmarks are focused on video or board games, and a large number of robotics benchmarks lack diversity and real-world applicability. In this paper, we aim to simplify the process of applying reinforcement learning in the 3D simulation league of Robocup, a robotic football competition. To this end, we introduce a Robocup-based RL environment based on the open source rcssserver3d soccer server, simple pre-defined tasks, and integration with a popular RL library, Stable Baselines 3. Our environment enables the creation of high-dimensional continuous control tasks within a robotics football simulation. In each task, an RL agent controls a simulated Nao robot, and can interact with the ball or other agents. We open-source our environment and training code at https://github.com/Michael-Beukman/RobocupGym.

RobocupGym: A challenging continuous control benchmark in Robocup

TL;DR

This work introduces RobocupGym, an open-source library that links the Simspark-based Robocup 3D simulator with Stable Baselines 3 to create continuous-control RL benchmarks in robotic soccer. It provides premade tasks (SimpleKick and VelocityKick), a modular architecture for adding new tasks, and a Gymnasium-compatible interface to enable standard RL workflows and parallel training. Initial results show PPO and SAC can learn kicking behaviors, with PPO often outperforming SAC and parallelism reducing training time. By delivering a realistic, extendable robotics benchmark, RobocupGym enables practical RL research in high-dimensional, real-world-like control and paves the way for more sophisticated multi-task and multi-agent RL in robotic football.

Abstract

Reinforcement learning (RL) has progressed substantially over the past decade, with much of this progress being driven by benchmarks. Many benchmarks are focused on video or board games, and a large number of robotics benchmarks lack diversity and real-world applicability. In this paper, we aim to simplify the process of applying reinforcement learning in the 3D simulation league of Robocup, a robotic football competition. To this end, we introduce a Robocup-based RL environment based on the open source rcssserver3d soccer server, simple pre-defined tasks, and integration with a popular RL library, Stable Baselines 3. Our environment enables the creation of high-dimensional continuous control tasks within a robotics football simulation. In each task, an RL agent controls a simulated Nao robot, and can interact with the ball or other agents. We open-source our environment and training code at https://github.com/Michael-Beukman/RobocupGym.
Paper Structure (9 sections, 5 figures, 1 table)

This paper contains 9 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: An illustration of the Robocup 3D simulation domain, where two teams of 11 players compete against each other.
  • Figure 2: Illustrating the (a) initialisation and (b) processing pipeline of RobocupGym.
  • Figure 3: Comparing PPO and SAC on the VelocityKick environment. PPO outperforms SAC, and scales well as we increase the number of workers.
  • Figure 4: Comparing the two different kick environments we provide. Both lead to reasonable kicks, but VelocityKick is faster while reaching slightly shorter distances. We use PPO in this plot.
  • Figure 5: Runtime for a varying number of processes (for PPO) on the VelocityKick environment. We find a reduction in runtime from around 60 hours with one worker to three hours for 128 workers.