Table of Contents
Fetching ...

Scilab-RL: A software framework for efficient reinforcement learning and cognitive modeling research

Jan Dohmen, Frank Röder, Manfred Eppe

TL;DR

Scilab-RL tackles onboarding bottlenecks in cognitive modeling and reinforcement learning by delivering a modular, open-source framework that unifies goal-conditioned RL, robotic environments, and visualization tools. It combines Stable Baselines 3, OpenAI Gym, MuJoCo, and CoppeliaSim within a YAML-configured Python core, augmented by Hydra and Optuna for hyperparameter optimization and online metric tracking with MLFlow/Weights & Biases. A key idea is to augment the extrinsic reward $r_e$ with an intrinsic term $r_i$ as $r = (1 - \eta) r_e + \eta r_i$, for $0 \le \eta \le 1$, demonstrated in the illustrative SAC variant. The framework accelerates experimentation for both newcomers and experts, enabling rapid development, testing, and comparison of algorithms and environments in robotics research.

Abstract

One problem with researching cognitive modeling and reinforcement learning (RL) is that researchers spend too much time on setting up an appropriate computational framework for their experiments. Many open source implementations of current RL algorithms exist, but there is a lack of a modular suite of tools combining different robotic simulators and platforms, data visualization, hyperparameter optimization, and baseline experiments. To address this problem, we present Scilab-RL, a software framework for efficient research in cognitive modeling and reinforcement learning for robotic agents. The framework focuses on goal-conditioned reinforcement learning using Stable Baselines 3 and the OpenAI gym interface. It enables native possibilities for experiment visualizations and hyperparameter optimization. We describe how these features enable researchers to conduct experiments with minimal time effort, thus maximizing research output.

Scilab-RL: A software framework for efficient reinforcement learning and cognitive modeling research

TL;DR

Scilab-RL tackles onboarding bottlenecks in cognitive modeling and reinforcement learning by delivering a modular, open-source framework that unifies goal-conditioned RL, robotic environments, and visualization tools. It combines Stable Baselines 3, OpenAI Gym, MuJoCo, and CoppeliaSim within a YAML-configured Python core, augmented by Hydra and Optuna for hyperparameter optimization and online metric tracking with MLFlow/Weights & Biases. A key idea is to augment the extrinsic reward with an intrinsic term as , for , demonstrated in the illustrative SAC variant. The framework accelerates experimentation for both newcomers and experts, enabling rapid development, testing, and comparison of algorithms and environments in robotics research.

Abstract

One problem with researching cognitive modeling and reinforcement learning (RL) is that researchers spend too much time on setting up an appropriate computational framework for their experiments. Many open source implementations of current RL algorithms exist, but there is a lack of a modular suite of tools combining different robotic simulators and platforms, data visualization, hyperparameter optimization, and baseline experiments. To address this problem, we present Scilab-RL, a software framework for efficient research in cognitive modeling and reinforcement learning for robotic agents. The framework focuses on goal-conditioned reinforcement learning using Stable Baselines 3 and the OpenAI gym interface. It enables native possibilities for experiment visualizations and hyperparameter optimization. We describe how these features enable researchers to conduct experiments with minimal time effort, thus maximizing research output.
Paper Structure (16 sections, 2 equations, 3 figures, 1 table)

This paper contains 16 sections, 2 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: An overview of the tools used in Scilab-RL.
  • Figure 2: Online rendering capabilities. Example using the MuJoCo FetchPush environment.
  • Figure 3: Hyperparameter optimization for the modified SAC algorithm adding critic variance to the reward. Experiment for the FetchPush environment.