Safety Reinforced Model Predictive Control (SRMPC): Improving MPC with Reinforcement Learning for Motion Planning in Autonomous Driving
Johannes Fischer, Marlon Steiner, Ömer Sahin Tas, Christoph Stiller
TL;DR
The paper tackles safe, real-time motion planning for autonomous driving by integrating Safe Reinforcement Learning with Model Predictive Control. It introduces SRMPC, which uses SRL to generate a safe reference trajectory that informs the LTV-MPC optimization, aided by an energy-based Safety Index and a state-dependent Lagrangian multiplier to enforce safety constraints. Empirical results on highway driving show SRMPC outperforms standalone SRL and LTV-MPC in safety and often in performance, particularly in light traffic, with some computational overhead due to SRL trajectory generation. The work suggests promising directions for extending to NMPC and incorporating RSS-style safety formulations to further enhance real-time safety guarantees. Overall, SRMPC demonstrates a practical pathway to improve both safety and efficiency in autonomous driving motion planning by marrying optimization with learned, safety-aware references.
Abstract
Model predictive control (MPC) is widely used for motion planning, particularly in autonomous driving. Real-time capability of the planner requires utilizing convex approximation of optimal control problems (OCPs) for the planner. However, such approximations confine the solution to a subspace, which might not contain the global optimum. To address this, we propose using safe reinforcement learning (SRL) to obtain a new and safe reference trajectory within MPC. By employing a learning-based approach, the MPC can explore solutions beyond the close neighborhood of the previous one, potentially finding global optima. We incorporate constrained reinforcement learning (CRL) to ensure safety in automated driving, using a handcrafted energy function-based safety index as the constraint objective to model safe and unsafe regions. Our approach utilizes a state-dependent Lagrangian multiplier, learned concurrently with the safe policy, to solve the CRL problem. Through experimentation in a highway scenario, we demonstrate the superiority of our approach over both MPC and SRL in terms of safety and performance measures.
