Reinforcement Learning with Ensemble Model Predictive Safety Certification
Sven Gronauer, Tom Haider, Felippe Schmoeller da Roza, Klaus Diepold
TL;DR
The paper tackles the challenge of safe exploration in deep reinforcement learning for safety-critical tasks by introducing Ensemble Model Predictive Safety Certification (X-MPSC). X-MPSC combines an ensemble of probabilistic dynamics models with tube-based model predictive control to certify and potentially modify the learner's actions, ensuring all safety constraints are respected over a planning horizon. The method demonstrates substantially fewer constraint violations than competitive baselines, and the use of a coarse prior dynamics model can reduce violations by an order of magnitude without harming performance. The approach relies on offline data from a safe backup controller to bootstrap training and leverages ellipsoidal uncertainty and recursive feasibility to maintain safety during learning, offering a practical path toward safe real-world deployment in robotics and related domains.
Abstract
Reinforcement learning algorithms need exploration to learn. However, unsupervised exploration prevents the deployment of such algorithms on safety-critical tasks and limits real-world deployment. In this paper, we propose a new algorithm called Ensemble Model Predictive Safety Certification that combines model-based deep reinforcement learning with tube-based model predictive control to correct the actions taken by a learning agent, keeping safety constraint violations at a minimum through planning. Our approach aims to reduce the amount of prior knowledge about the actual system by requiring only offline data generated by a safe controller. Our results show that we can achieve significantly fewer constraint violations than comparable reinforcement learning methods.
