Table of Contents
Fetching ...

Safety Reinforced Model Predictive Control (SRMPC): Improving MPC with Reinforcement Learning for Motion Planning in Autonomous Driving

Johannes Fischer, Marlon Steiner, Ömer Sahin Tas, Christoph Stiller

TL;DR

The paper tackles safe, real-time motion planning for autonomous driving by integrating Safe Reinforcement Learning with Model Predictive Control. It introduces SRMPC, which uses SRL to generate a safe reference trajectory that informs the LTV-MPC optimization, aided by an energy-based Safety Index and a state-dependent Lagrangian multiplier to enforce safety constraints. Empirical results on highway driving show SRMPC outperforms standalone SRL and LTV-MPC in safety and often in performance, particularly in light traffic, with some computational overhead due to SRL trajectory generation. The work suggests promising directions for extending to NMPC and incorporating RSS-style safety formulations to further enhance real-time safety guarantees. Overall, SRMPC demonstrates a practical pathway to improve both safety and efficiency in autonomous driving motion planning by marrying optimization with learned, safety-aware references.

Abstract

Model predictive control (MPC) is widely used for motion planning, particularly in autonomous driving. Real-time capability of the planner requires utilizing convex approximation of optimal control problems (OCPs) for the planner. However, such approximations confine the solution to a subspace, which might not contain the global optimum. To address this, we propose using safe reinforcement learning (SRL) to obtain a new and safe reference trajectory within MPC. By employing a learning-based approach, the MPC can explore solutions beyond the close neighborhood of the previous one, potentially finding global optima. We incorporate constrained reinforcement learning (CRL) to ensure safety in automated driving, using a handcrafted energy function-based safety index as the constraint objective to model safe and unsafe regions. Our approach utilizes a state-dependent Lagrangian multiplier, learned concurrently with the safe policy, to solve the CRL problem. Through experimentation in a highway scenario, we demonstrate the superiority of our approach over both MPC and SRL in terms of safety and performance measures.

Safety Reinforced Model Predictive Control (SRMPC): Improving MPC with Reinforcement Learning for Motion Planning in Autonomous Driving

TL;DR

The paper tackles safe, real-time motion planning for autonomous driving by integrating Safe Reinforcement Learning with Model Predictive Control. It introduces SRMPC, which uses SRL to generate a safe reference trajectory that informs the LTV-MPC optimization, aided by an energy-based Safety Index and a state-dependent Lagrangian multiplier to enforce safety constraints. Empirical results on highway driving show SRMPC outperforms standalone SRL and LTV-MPC in safety and often in performance, particularly in light traffic, with some computational overhead due to SRL trajectory generation. The work suggests promising directions for extending to NMPC and incorporating RSS-style safety formulations to further enhance real-time safety guarantees. Overall, SRMPC demonstrates a practical pathway to improve both safety and efficiency in autonomous driving motion planning by marrying optimization with learned, safety-aware references.

Abstract

Model predictive control (MPC) is widely used for motion planning, particularly in autonomous driving. Real-time capability of the planner requires utilizing convex approximation of optimal control problems (OCPs) for the planner. However, such approximations confine the solution to a subspace, which might not contain the global optimum. To address this, we propose using safe reinforcement learning (SRL) to obtain a new and safe reference trajectory within MPC. By employing a learning-based approach, the MPC can explore solutions beyond the close neighborhood of the previous one, potentially finding global optima. We incorporate constrained reinforcement learning (CRL) to ensure safety in automated driving, using a handcrafted energy function-based safety index as the constraint objective to model safe and unsafe regions. Our approach utilizes a state-dependent Lagrangian multiplier, learned concurrently with the safe policy, to solve the CRL problem. Through experimentation in a highway scenario, we demonstrate the superiority of our approach over both MPC and SRL in terms of safety and performance measures.

Paper Structure

This paper contains 27 sections, 17 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: Bicycle model: The forces (red arrows) are not considered in the kinematic bicycle model and the sideslip angles are assumed to be $\alpha_{f,r}=0$.
  • Figure 2: CMDP observation of the ego-vehicle (red): The ego-vehicle observes the vehicle in front of and the one behind itself on each lane (represented in blue). The vehicles shown in gray are not considered.
  • Figure 3: SRL trajectory (dashed lines) used for linearization within the LTV-MPC and the locally optimal solution (continuous lines) of the MPC based on the SRL trajectory.
  • Figure 4: Training results for the baseline RL algorithm PPO and the SRL algorithm PPO-L-SI.