An Architecture for Unattended Containerized (Deep) Reinforcement Learning with Webots
Tobias Haubold, Petra Linke
TL;DR
The paper tackles the challenge of performing unattended reinforcement learning in 3D robot simulations without requiring data scientists to master the underlying simulation software. It proposes a two‑container architecture in which a Webots simulator runs on demand behind a Webots Facade, and a separate training environment communicates via ROS and Python gymnasium APIs, enabling automated, reproducible pipelines. The Robotino‑centered example demonstrates end‑to‑end training with two tf_agents algorithms over hundreds of hours, while detailing limitations (e.g., single Webots instance) and practical workarounds. The approach generalizes beyond Robotino by separating world creation, model development, and infrastructure, supporting scalable RL workflows that can extend to other robots and industrial use cases.
Abstract
As data science applications gain adoption across industries, the tooling landscape matures to facilitate the life cycle of such applications and provide solutions to the challenges involved to boost the productivity of the people involved. Reinforcement learning with agents in a 3D world could still face challenges: the knowledge required to use a simulation software as well as the utilization of a standalone simulation software in unattended training pipelines. In this paper we review tools and approaches to train reinforcement learning agents for robots in 3D worlds with respect to the robot Robotino and argue that the separation of the simulation environment for creators of virtual worlds and the model development environment for data scientists is not a well covered topic. Often both are the same and data scientists require knowledge of the simulation software to work directly with their APIs. Moreover, sometimes creators of virtual worlds and data scientists even work on the same files. We want to contribute to that topic by describing an approach where data scientists don't require knowledge about the simulation software. Our approach uses the standalone simulation software Webots, the Robot Operating System to communicate with simulated robots as well as the simulation software itself and container technology to separate the simulation from the model development environment. We put emphasize on the APIs the data scientists work with and the use of a standalone simulation software in unattended training pipelines. We show the parts that are specific to the Robotino and the robot task to learn.
