Safe Reinforcement Learning for Real-World Engine Control
Julian Bedei, Lucas Koch, Kevin Badalian, Alexander Winkler, Patrick Schaber, Jakob Andert
TL;DR
This work tackles the challenge of deploying reinforcement learning in safety-critical real-world environments by introducing a safe RL toolchain based on Deep Deterministic Policy Gradient (DDPG) for HCCI engine control. Safety is achieved through a dynamic measurement algorithm that maps the experimental space and a k-NN based safety monitor that can replace unsafe actions during real-time operation. The approach is validated on a single-cylinder HCCI testbench, achieving IMEP RMSE of $0.1374$ and enabling online adaptation toward higher ethanol energy shares while maintaining safety, with convergence within tens of thousands of cycles. The results demonstrate that safe RL can match or surpass ANN-based references in accuracy and substantially improve safety performance, paving the way for deploying RL in real-world, safety-critical robotic, automotive, and aerospace systems.
Abstract
This work introduces a toolchain for applying Reinforcement Learning (RL), specifically the Deep Deterministic Policy Gradient (DDPG) algorithm, in safety-critical real-world environments. As an exemplary application, transient load control is demonstrated on a single-cylinder internal combustion engine testbench in Homogeneous Charge Compression Ignition (HCCI) mode, that offers high thermal efficiency and low emissions. However, HCCI poses challenges for traditional control methods due to its nonlinear, autoregressive, and stochastic nature. RL provides a viable solution, however, safety concerns, such as excessive pressure rise rates, must be addressed when applying to HCCI. A single unsuitable control input can severely damage the engine or cause misfiring and shut down. Additionally, operating limits are not known a priori and must be determined experimentally. To mitigate these risks, real-time safety monitoring based on the k-nearest neighbor algorithm is implemented, enabling safe interaction with the testbench. The feasibility of this approach is demonstrated as the RL agent learns a control policy through interaction with the testbench. A root mean square error of 0.1374 bar is achieved for the indicated mean effective pressure, comparable to neural network-based controllers from the literature. The toolchain's flexibility is further demonstrated by adapting the agent's policy to increase ethanol energy shares, promoting renewable fuel use while maintaining safety. This RL approach addresses the longstanding challenge of applying RL to safety-critical real-world environments. The developed toolchain, with its adaptability and safety mechanisms, paves the way for future applicability of RL in engine testbenches and other safety-critical settings.
