CRoSS: A Continual Robotic Simulation Suite for Scalable Reinforcement Learning with High Task Diversity and Realistic Physics Simulation
Yannick Denker, Alexander Gepperth
TL;DR
CRoSS introduces a Gazebo-based continual reinforcement learning benchmark for robotics, addressing realism, task diversity, and out-of-the-box usability gaps in prior CRL suites. It combines two platforms—a 2-wheeled differential-drive robot and a 7-DOF robot arm—with six benchmarks that cover cartesian and joint-space control, plus fast, kinematic variants to accelerate experimentation. Baseline RL methods like DQN and REINFORCE exhibit clear catastrophic forgetting across sequential tasks, while the kinematic variants enable rapid hyperparameter search without sacrificing task structure or transferability to the simulated physics. The benchmark emphasizes reproducibility and sim-to-real potential via containerized deployment and ROS-Gazebo interoperability, making CRoSS a scalable, extensible foundation for advancing CRL in robotics and prospective real-world deployments.
Abstract
Continual reinforcement learning (CRL) requires agents to learn from a sequence of tasks without forgetting previously acquired policies. In this work, we introduce a novel benchmark suite for CRL based on realistically simulated robots in the Gazebo simulator. Our Continual Robotic Simulation Suite (CRoSS) benchmarks rely on two robotic platforms: a two-wheeled differential-drive robot with lidar, camera and bumper sensor, and a robotic arm with seven joints. The former represent an agent in line-following and object-pushing scenarios, where variation of visual and structural parameters yields a large number of distinct tasks, whereas the latter is used in two goal-reaching scenarios with high-level cartesian hand position control (modeled after the Continual World benchmark), and low-level control based on joint angles. For the robotic arm benchmarks, we provide additional kinematics-only variants that bypass the need for physical simulation (as long as no sensor readings are required), and which can be run two orders of magnitude faster. CRoSS is designed to be easily extensible and enables controlled studies of continual reinforcement learning in robotic settings with high physical realism, and in particular allow the use of almost arbitrary simulated sensors. To ensure reproducibility and ease of use, we provide a containerized setup (Apptainer) that runs out-of-the-box, and report performances of standard RL algorithms, including Deep Q-Networks (DQN) and policy gradient methods. This highlights the suitability as a scalable and reproducible benchmark for CRL research.
