Controlgym: Large-Scale Control Environments for Benchmarking Reinforcement Learning Algorithms

Xiangyuan Zhang; Weichao Mao; Saviz Mowlavi; Mouhacine Benosman; Tamer Başar

Controlgym: Large-Scale Control Environments for Benchmarking Reinforcement Learning Algorithms

Xiangyuan Zhang, Weichao Mao, Saviz Mowlavi, Mouhacine Benosman, Tamer Başar

TL;DR

controlgym delivers a scalable library of linear and PDE-based control environments integrated with Gym/Gymnasium to benchmark reinforcement learning in continuous, high-dimensional settings. By providing both discrete-time linear state-space models and space-time discretized PDE dynamics with distributed inputs, it enables rigorous evaluation of RL convergence, stability, and scalability to (potentially) infinite dimensions. The framework supports model-based baselines (e.g., LQG/LQR, H2/Hinf) and model-free RL (e.g., PPO), plus open-loop analyses of PDEs through eigenvalue calculations and zero-controller trajectories. This repository advances learning for dynamics & control (L4DC) by offering a versatile, open-source testbed for theory-practice integration and robust controller design in industrially relevant contexts.

Abstract

We introduce controlgym, a library of thirty-six industrial control settings, and ten infinite-dimensional partial differential equation (PDE)-based control problems. Integrated within the OpenAI Gym/Gymnasium (Gym) framework, controlgym allows direct applications of standard reinforcement learning (RL) algorithms like stable-baselines3. Our control environments complement those in Gym with continuous, unbounded action and observation spaces, motivated by real-world control applications. Moreover, the PDE control environments uniquely allow the users to extend the state dimensionality of the system to infinity while preserving the intrinsic dynamics. This feature is crucial for evaluating the scalability of RL algorithms for control. This project serves the learning for dynamics & control (L4DC) community, aiming to explore key questions: the convergence of RL algorithms in learning control policies; the stability and robustness issues of learning-based controllers; and the scalability of RL algorithms to high- and potentially infinite-dimensional systems. We open-source the controlgym project at https://github.com/xiangyuan-zhang/controlgym.

Controlgym: Large-Scale Control Environments for Benchmarking Reinforcement Learning Algorithms

TL;DR

Abstract

Paper Structure (20 sections, 26 equations, 16 figures, 2 tables)

This paper contains 20 sections, 26 equations, 16 figures, 2 tables.

Introduction
Control Environments
Linear Control Environments
PDE Control Environments
Convection-Diffusion-Reaction Equation
Wave Equation
Schrödinger Equation
Burgers' Equation
Kuramoto-Sivashinsky Equation
Fisher Equation
Allen-Cahn Equation
Korteweg-de Vries Equation
Cahn-Hilliard Equation
Ginzburg-Landau Equation
Examples of Using controlgym
...and 5 more sections

Figures (16)

Figure 1: Environments included in this work, motivated by industrial control applications
Figure 2: Illustration of how distributed control inputs influence the dynamics of a PDE through forcing support functions $\Phi_j$, taking $\Omega = [0,1]$ as an example. Left: The forcing support function corresponding to a single control input is depicted, with its width, a tunable parameter, set to $0.3$. This represents a control input that uniformly affects state components spanning the middle $30\%$ of the physical domain. Middle: The forcing support functions corresponding to two control inputs are shown. They are spaced equidistantly from one another, and each has a width of $0.1$ so that each control input uniformly affects state components spanning $10\%$ of the physical domain. Right: The forcing support functions corresponding to five control inputs are shown, each with a width of $0.05$ uniformly affecting state components spanning $5\%$ of the physical domain.
Figure 3: The uncontrolled solution to the CDR equation in a domain of length $L=1$ with parameters, $c=0.01$, $\nu = 0.002$, and $r = 0.1$. The initial condition is $u(x, t=0) = \mathrm{sech}(10x - 5)$. Left: Contour plot that shows the value of the state variable over the total simulation time ($x$-axis) and across the spatial domain ($y$-axis). Middle: Lines representing the state variable at fixed times. The $x$- and $y$-axes represent spatial coordinates and values of the state variable, respectively. The color of the lines corresponds to different time stamps within the total simulation time. Right: 3D surface plot showing the value of the state variable ($z$-axis) over time ($y$-axis) and across the spatial domain ($x$-axis).
Figure 4: The uncontrolled solution to the wave equation in a domain of length $L=1$ with parameter $c=0.1$. The initial conditions are $u(x, t=0) = \mathrm{sech}(10x-5)$ and $\psi(x, t=0) = 0$. The figure convention is consistent with that of Figure \ref{['fig:cdr']}.
Figure 5: The uncontrolled solution to the Schrödinger equation in a domain of length $L=1$ with parameters $\hbar = 1.0$, $m=1.0$, and $V=0.0$. The initial conditions are $\xi(x, t=0)=\mathrm{sech}(10x-5)$ and $\eta(x, t=0) = 0$. The figure convention is consistent with that of Figure \ref{['fig:cdr']}.
...and 11 more figures

Controlgym: Large-Scale Control Environments for Benchmarking Reinforcement Learning Algorithms

TL;DR

Abstract

Controlgym: Large-Scale Control Environments for Benchmarking Reinforcement Learning Algorithms

Authors

TL;DR

Abstract

Table of Contents

Figures (16)