Stabilizing reinforcement learning control: A modular framework for optimizing over all stable behavior
Nathan P. Lawrence, Philip D. Loewen, Shuyuan Wang, Michael G. Forbes, R. Bhushan Gopaluni
TL;DR
This work addresses the challenge of ensuring stability in reinforcement learning-based control by embedding the Youla–Kučera parameterization into a data-driven framework. By constructing a data-driven internal model via Willems' dynamic lemma and Hankel data, the authors delineate a stable operator $Q$ that governs the closed-loop behavior through $K(z)=Q(z)/(1-P(z)Q(z))$, enabling both linear and nonlinear realizations and fixed-structure tuning. They establish stability criteria for Hankel models under noise, provide probabilistic bounds for random Hankel matrices, and develop Lyapunov-based methods to train stable $Q$ with modular RL integration. Simulation studies on an industrial tank and fixed-structure controller tuning demonstrate improved stability and performance, illustrating the framework's practical potential for safe, data-driven control in process systems. Overall, the modular approach decouples algorithms, function approximators, and dynamic models, offering a scalable path to stable, data-driven RL across linear, nonlinear, and MIMO settings.
Abstract
We propose a framework for the design of feedback controllers that combines the optimization-driven and model-free advantages of deep reinforcement learning with the stability guarantees provided by using the Youla-Kucera parameterization to define the search domain. Recent advances in behavioral systems allow us to construct a data-driven internal model; this enables an alternative realization of the Youla-Kucera parameterization based entirely on input-output exploration data. Perhaps of independent interest, we formulate and analyze the stability of such data-driven models in the presence of noise. The Youla-Kucera approach requires a stable "parameter" for controller design. For the training of reinforcement learning agents, the set of all stable linear operators is given explicitly through a matrix factorization approach. Moreover, a nonlinear extension is given using a neural network to express a parameterized set of stable operators, which enables seamless integration with standard deep learning libraries. Finally, we show how these ideas can also be applied to tune fixed-structure controllers.
