Safe Reinforcement Learning Using Robust MPC
Mario Zanon, Sébastien Gros
TL;DR
This paper tackles safety in reinforcement learning by integrating robust model predictive control (MPC) as a function-approximation mechanism within RL. The authors introduce Safe RL-MPC, which uses a tube-based robust MPC with a parametrizable uncertainty set and a Safe Design Constraint to guarantee constraint satisfaction during both learning and deployment. Key contributions include a data-efficient approach to manage large data streams via a nominal linear model and convex-hull data compression, a differentiable MPC scheme for gradient-based RL updates, and algorithms for safe exploration and recursive feasibility. The framework is demonstrated on a linear system and a nonlinear evaporation process, showing how RL can adapt the uncertainty set and the MPC parameters to improve performance while preserving safety. The work provides a foundation for extending safe RL with scenario trees, stochastic gradients, and hybrid cost formulations for broader real-world applications.
Abstract
Reinforcement Learning (RL) has recently impressed the world with stunning results in various applications. While the potential of RL is now well-established, many critical aspects still need to be tackled, including safety and stability issues. These issues, while partially neglected by the RL community, are central to the control community which has been widely investigating them. Model Predictive Control (MPC) is one of the most successful control techniques because, among others, of its ability to provide such guarantees even for uncertain constrained systems. Since MPC is an optimization-based technique, optimality has also often been claimed. Unfortunately, the performance of MPC is highly dependent on the accuracy of the model used for predictions. In this paper, we propose to combine RL and MPC in order to exploit the advantages of both and, therefore, obtain a controller which is optimal and safe. We illustrate the results with a numerical example in simulations.
