A modular framework for stabilizing deep reinforcement learning control
Nathan P. Lawrence, Philip D. Loewen, Shuyuan Wang, Michael G. Forbes, R. Bhushan Gopaluni
TL;DR
The paper addresses stability in RL-based control by embedding the Youla-Kučera parameterization to constrain the search to stable operators, while replacing explicit plant models with a data-driven internal model built from input-output data. It learns stable nonlinear operators via a Lyapunov-guided two-network design and realizes the Youla-Kučera framework in a model-free manner using Willems' fundamental lemma to relate data to the closed-loop behavior. The approach enables standard RL optimization over a stable $Q$-parameterization, with the objective $J(\pi) = \mathbb{E}_{h \sim p^{\pi}}[\sum_{t=0}^{\infty} \gamma^{t} r(s_t,a_t)]$, demonstrated on a simulated two-tank system where learning converges stably and achieves favorable performance. This work offers a practical path to stable, data-driven RL for process control, with clear avenues for extending to stochastic policies and unstable plants.
Abstract
We propose a framework for the design of feedback controllers that combines the optimization-driven and model-free advantages of deep reinforcement learning with the stability guarantees provided by using the Youla-Kucera parameterization to define the search domain. Recent advances in behavioral systems allow us to construct a data-driven internal model; this enables an alternative realization of the Youla-Kucera parameterization based entirely on input-output exploration data. Using a neural network to express a parameterized set of nonlinear stable operators enables seamless integration with standard deep learning libraries. We demonstrate the approach on a realistic simulation of a two-tank system.
