Table of Contents
Fetching ...

ReACT: Reinforcement Learning for Controller Parametrization using B-Spline Geometries

Thomas Rudolf, Daniel Flögel, Tobias Schürmann, Simon Süß, Stefan Schwab, Sören Hohmann

TL;DR

The paper tackles the costly task of parametrizing fixed-structure gain-scheduling controllers for nonlinear, parameter-varying systems. It introduces ReACT, a deep reinforcement learning framework that represents high-dimensional controller parameters with $N$-dimensional B-spline geometries, processes time-series observations with an LSTM, and updates control-point parameters through actions that optimize the closed-loop objective $J$ using off-policy methods like TQC with actor regularization. Key contributions include efficient BSG-based parameter spaces, a self-competition reward, and ablations showing improved stability and robustness from dropout/layer-normalization in the actor. The approach is validated on a parameter-variant FOPDT plant with dead-time, demonstrating faster, more robust adaptation of PI gains and improved tracking under noise, with a pathway toward automatic, industrial-scale controller parametrization and deployment.

Abstract

Robust and performant controllers are essential for industrial applications. However, deriving controller parameters for complex and nonlinear systems is challenging and time-consuming. To facilitate automatic controller parametrization, this work presents a novel approach using deep reinforcement learning (DRL) with N-dimensional B-spline geometries (BSGs). We focus on the control of parameter-variant systems, a class of systems with complex behavior which depends on the operating conditions. For this system class, gain-scheduling control structures are widely used in applications across industries due to well-known design principles. Facilitating the expensive controller parametrization task regarding these control structures, we deploy an DRL agent. Based on control system observations, the agent autonomously decides how to adapt the controller parameters. We make the adaptation process more efficient by introducing BSGs to map the controller parameters which may depend on numerous operating conditions. To preprocess time-series data and extract a fixed-length feature vector, we use a long short-term memory (LSTM) neural networks. Furthermore, this work contributes actor regularizations that are relevant to real-world environments which differ from training. Accordingly, we apply dropout layer normalization to the actor and critic networks of the truncated quantile critic (TQC) algorithm. To show our approach's working principle and effectiveness, we train and evaluate the DRL agent on the parametrization task of an industrial control structure with parameter lookup tables.

ReACT: Reinforcement Learning for Controller Parametrization using B-Spline Geometries

TL;DR

The paper tackles the costly task of parametrizing fixed-structure gain-scheduling controllers for nonlinear, parameter-varying systems. It introduces ReACT, a deep reinforcement learning framework that represents high-dimensional controller parameters with -dimensional B-spline geometries, processes time-series observations with an LSTM, and updates control-point parameters through actions that optimize the closed-loop objective using off-policy methods like TQC with actor regularization. Key contributions include efficient BSG-based parameter spaces, a self-competition reward, and ablations showing improved stability and robustness from dropout/layer-normalization in the actor. The approach is validated on a parameter-variant FOPDT plant with dead-time, demonstrating faster, more robust adaptation of PI gains and improved tracking under noise, with a pathway toward automatic, industrial-scale controller parametrization and deployment.

Abstract

Robust and performant controllers are essential for industrial applications. However, deriving controller parameters for complex and nonlinear systems is challenging and time-consuming. To facilitate automatic controller parametrization, this work presents a novel approach using deep reinforcement learning (DRL) with N-dimensional B-spline geometries (BSGs). We focus on the control of parameter-variant systems, a class of systems with complex behavior which depends on the operating conditions. For this system class, gain-scheduling control structures are widely used in applications across industries due to well-known design principles. Facilitating the expensive controller parametrization task regarding these control structures, we deploy an DRL agent. Based on control system observations, the agent autonomously decides how to adapt the controller parameters. We make the adaptation process more efficient by introducing BSGs to map the controller parameters which may depend on numerous operating conditions. To preprocess time-series data and extract a fixed-length feature vector, we use a long short-term memory (LSTM) neural networks. Furthermore, this work contributes actor regularizations that are relevant to real-world environments which differ from training. Accordingly, we apply dropout layer normalization to the actor and critic networks of the truncated quantile critic (TQC) algorithm. To show our approach's working principle and effectiveness, we train and evaluate the DRL agent on the parametrization task of an industrial control structure with parameter lookup tables.
Paper Structure (11 sections, 18 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 11 sections, 18 equations, 7 figures, 1 table, 1 algorithm.

Figures (7)

  • Figure 1: We propose ReACT to effectively learn parametrization strategies for fixed control structures that depend on operating conditions. Our DRL agent observes closed-loop signals $\bm{o}$ and acts through parameter adaptations $\bm{a}$ using B-spline geometries (BSGs). ReACT improves the controller parameters $\bm{\phi}$ toward the control performance objective $J$.
  • Figure 2: An exemplary oscillating curve (left, blue) approximated with a univariate BSG-curve $S_1(\bm{\mathbf{v}}_0)$ through its basis functions (gray) and a BSG-surface $S_2(\bm{\mathrm{v}}_0, \bm{\mathrm{v}}_1)$ spanned with a bivariate BSG (right). The CPs $\bm{p}_{i_n}$ (yellow dots) shape the geometries accordingly.
  • Figure 3: Structure of the proposed ReACT agent which observes closed-loop time series data $\bm{\zeta}(t)$ concatenated with stationary information $\bm{c}_\text{s}$. The extracted feature vector $\bm{z}$ represents the latent environment state. The agent's actions $\bm{a}$ incrementally adjust the B-spline CPs of controller parameters $\phi_l$.
  • Figure 4: Exemplary closed-loop PI control structure with the parameter-varying system $\bm{\Sigma}$ and gain-scheduling lookup tables $\bm{\phi} = \left\{ k_P, k_I \right\}$.
  • Figure 5: BSG-surface CPs in yellow with selected CPs in green which are adjusted by our ReACT agent. The selection is defined around the mean and the standard deviation along each of the dependencies $\bm{w}$.
  • ...and 2 more figures