Table of Contents
Fetching ...

A Safe Reinforcement Learning driven Weights-varying Model Predictive Control for Autonomous Vehicle Motion Control

Baha Zarrouki, Marios Spanakakis, Johannes Betz

TL;DR

This work tackles the challenge of selecting MPC cost-function weights that remain effective under varying operating conditions while ensuring safety. It introduces a two-step framework: first, Multiobjective Bayesian Optimization (MOBO) computes a catalog $]$ of Pareto-optimal weight sets; second, a Look-ahead Deep Reinforcement Learning (RL) agent selects discrete actions from this catalog to drive a Weights-varying NMPC (WMPC) in real time. The key contributions are the safe RL design that avoids learning in a continuous, potentially unsafe space by restricting choices to pre-optimized weights, and the demonstration that RL-WMPC can achieve closed-loop performance beyond the Pareto front on a full-scale autonomous vehicle, with demonstrated generalization to unseen tracks. Overall, the approach significantly reduces manual tuning while providing online adaptability and safety guarantees, offering practical impact for robust autonomous-vehicle motion control. The methodology combines MOBO for safe weight catalog construction with a discrete-action RL policy that anticipates future tasks via look-ahead trajectories, enabling improved trajectory tracking and robustness across diverse driving scenarios.

Abstract

Determining the optimal cost function parameters of Model Predictive Control (MPC) to optimize multiple control objectives is a challenging and time-consuming task. Multiobjective Bayesian Optimization (BO) techniques solve this problem by determining a Pareto optimal parameter set for an MPC with static weights. However, a single parameter set may not deliver the most optimal closed-loop control performance when the context of the MPC operating conditions changes during its operation, urging the need to adapt the cost function weights at runtime. Deep Reinforcement Learning (RL) algorithms can automatically learn context-dependent optimal parameter sets and dynamically adapt for a Weightsvarying MPC (WMPC). However, learning cost function weights from scratch in a continuous action space may lead to unsafe operating states. To solve this, we propose a novel approach limiting the RL actions within a safe learning space representing a catalog of pre-optimized BO Pareto-optimal weight sets. We conceive a RL agent not to learn in a continuous space but to proactively anticipate upcoming control tasks and to choose the most optimal discrete actions, each corresponding to a single set of Pareto optimal weights, context-dependent. Hence, even an untrained RL agent guarantees a safe and optimal performance. Experimental results demonstrate that an untrained RL-WMPC shows Pareto-optimal closed-loop behavior and training the RL-WMPC helps exhibit a performance beyond the Pareto-front.

A Safe Reinforcement Learning driven Weights-varying Model Predictive Control for Autonomous Vehicle Motion Control

TL;DR

This work tackles the challenge of selecting MPC cost-function weights that remain effective under varying operating conditions while ensuring safety. It introduces a two-step framework: first, Multiobjective Bayesian Optimization (MOBO) computes a catalog of Pareto-optimal weight sets; second, a Look-ahead Deep Reinforcement Learning (RL) agent selects discrete actions from this catalog to drive a Weights-varying NMPC (WMPC) in real time. The key contributions are the safe RL design that avoids learning in a continuous, potentially unsafe space by restricting choices to pre-optimized weights, and the demonstration that RL-WMPC can achieve closed-loop performance beyond the Pareto front on a full-scale autonomous vehicle, with demonstrated generalization to unseen tracks. Overall, the approach significantly reduces manual tuning while providing online adaptability and safety guarantees, offering practical impact for robust autonomous-vehicle motion control. The methodology combines MOBO for safe weight catalog construction with a discrete-action RL policy that anticipates future tasks via look-ahead trajectories, enabling improved trajectory tracking and robustness across diverse driving scenarios.

Abstract

Determining the optimal cost function parameters of Model Predictive Control (MPC) to optimize multiple control objectives is a challenging and time-consuming task. Multiobjective Bayesian Optimization (BO) techniques solve this problem by determining a Pareto optimal parameter set for an MPC with static weights. However, a single parameter set may not deliver the most optimal closed-loop control performance when the context of the MPC operating conditions changes during its operation, urging the need to adapt the cost function weights at runtime. Deep Reinforcement Learning (RL) algorithms can automatically learn context-dependent optimal parameter sets and dynamically adapt for a Weightsvarying MPC (WMPC). However, learning cost function weights from scratch in a continuous action space may lead to unsafe operating states. To solve this, we propose a novel approach limiting the RL actions within a safe learning space representing a catalog of pre-optimized BO Pareto-optimal weight sets. We conceive a RL agent not to learn in a continuous space but to proactively anticipate upcoming control tasks and to choose the most optimal discrete actions, each corresponding to a single set of Pareto optimal weights, context-dependent. Hence, even an untrained RL agent guarantees a safe and optimal performance. Experimental results demonstrate that an untrained RL-WMPC shows Pareto-optimal closed-loop behavior and training the RL-WMPC helps exhibit a performance beyond the Pareto-front.
Paper Structure (21 sections, 8 equations, 5 figures, 3 tables, 3 algorithms)

This paper contains 21 sections, 8 equations, 5 figures, 3 tables, 3 algorithms.

Figures (5)

  • Figure 1: Safe RL driven Weights-varying MPC: comparison between an untrained- and a trained policy. Even the untrained RL policy shows good closed-loop behavior, attributed to the global safety inherent in the pre-computed weights catalog obtained through Multi-objective Bayesian Optimization (MOBO).
  • Figure 2: Architecture of the Safe RL driven Weights-varying MPC
  • Figure 3: Architecture of the Deep Neural Network driven WMPC
  • Figure 4: Benchmark of the closed-loop performance of different settings following an optimal raceline on Monteblanco: RL driven Weights-varying MPC, nominal MPC with manual tuning and with different sets from the Pareto front.
  • Figure 5: Inexperienced Las Vegas Motor Speedway (LVMS): lateral- to velocity RMSE behavior.