A Safe Reinforcement Learning driven Weights-varying Model Predictive Control for Autonomous Vehicle Motion Control

Baha Zarrouki; Marios Spanakakis; Johannes Betz

A Safe Reinforcement Learning driven Weights-varying Model Predictive Control for Autonomous Vehicle Motion Control

Baha Zarrouki, Marios Spanakakis, Johannes Betz

TL;DR

This work tackles the challenge of selecting MPC cost-function weights that remain effective under varying operating conditions while ensuring safety. It introduces a two-step framework: first, Multiobjective Bayesian Optimization (MOBO) computes a catalog $[A]$ of Pareto-optimal weight sets; second, a Look-ahead Deep Reinforcement Learning (RL) agent selects discrete actions from this catalog to drive a Weights-varying NMPC (WMPC) in real time. The key contributions are the safe RL design that avoids learning in a continuous, potentially unsafe space by restricting choices to pre-optimized weights, and the demonstration that RL-WMPC can achieve closed-loop performance beyond the Pareto front on a full-scale autonomous vehicle, with demonstrated generalization to unseen tracks. Overall, the approach significantly reduces manual tuning while providing online adaptability and safety guarantees, offering practical impact for robust autonomous-vehicle motion control. The methodology combines MOBO for safe weight catalog construction with a discrete-action RL policy that anticipates future tasks via look-ahead trajectories, enabling improved trajectory tracking and robustness across diverse driving scenarios.

Abstract

Determining the optimal cost function parameters of Model Predictive Control (MPC) to optimize multiple control objectives is a challenging and time-consuming task. Multiobjective Bayesian Optimization (BO) techniques solve this problem by determining a Pareto optimal parameter set for an MPC with static weights. However, a single parameter set may not deliver the most optimal closed-loop control performance when the context of the MPC operating conditions changes during its operation, urging the need to adapt the cost function weights at runtime. Deep Reinforcement Learning (RL) algorithms can automatically learn context-dependent optimal parameter sets and dynamically adapt for a Weightsvarying MPC (WMPC). However, learning cost function weights from scratch in a continuous action space may lead to unsafe operating states. To solve this, we propose a novel approach limiting the RL actions within a safe learning space representing a catalog of pre-optimized BO Pareto-optimal weight sets. We conceive a RL agent not to learn in a continuous space but to proactively anticipate upcoming control tasks and to choose the most optimal discrete actions, each corresponding to a single set of Pareto optimal weights, context-dependent. Hence, even an untrained RL agent guarantees a safe and optimal performance. Experimental results demonstrate that an untrained RL-WMPC shows Pareto-optimal closed-loop behavior and training the RL-WMPC helps exhibit a performance beyond the Pareto-front.

A Safe Reinforcement Learning driven Weights-varying Model Predictive Control for Autonomous Vehicle Motion Control

TL;DR

of Pareto-optimal weight sets; second, a Look-ahead Deep Reinforcement Learning (RL) agent selects discrete actions from this catalog to drive a Weights-varying NMPC (WMPC) in real time. The key contributions are the safe RL design that avoids learning in a continuous, potentially unsafe space by restricting choices to pre-optimized weights, and the demonstration that RL-WMPC can achieve closed-loop performance beyond the Pareto front on a full-scale autonomous vehicle, with demonstrated generalization to unseen tracks. Overall, the approach significantly reduces manual tuning while providing online adaptability and safety guarantees, offering practical impact for robust autonomous-vehicle motion control. The methodology combines MOBO for safe weight catalog construction with a discrete-action RL policy that anticipates future tasks via look-ahead trajectories, enabling improved trajectory tracking and robustness across diverse driving scenarios.

Abstract

Paper Structure (21 sections, 8 equations, 5 figures, 3 tables, 3 algorithms)

This paper contains 21 sections, 8 equations, 5 figures, 3 tables, 3 algorithms.

Introduction
Weights-varying MPC
Static Nonlinear MPC
Multi-objective Bayesian Optimization: Determining Pareto Optimal Weight Sets
Formulating the optimization problem
Surrogate models
Acquisition function
Training track segmentation
Optimization algorithm
Pareto front reduction
Deep Reinforcement Learning: Safe Learning Weights-varying MPC
Defining the action space
Designing the reward function
Defining the observations
Simulation Results: Safe RL driven Weights-varying NMPC
...and 6 more sections

Figures (5)

Figure 1: Safe RL driven Weights-varying MPC: comparison between an untrained- and a trained policy. Even the untrained RL policy shows good closed-loop behavior, attributed to the global safety inherent in the pre-computed weights catalog obtained through Multi-objective Bayesian Optimization (MOBO).
Figure 2: Architecture of the Safe RL driven Weights-varying MPC
Figure 3: Architecture of the Deep Neural Network driven WMPC
Figure 4: Benchmark of the closed-loop performance of different settings following an optimal raceline on Monteblanco: RL driven Weights-varying MPC, nominal MPC with manual tuning and with different sets from the Pareto front.
Figure 5: Inexperienced Las Vegas Motor Speedway (LVMS): lateral- to velocity RMSE behavior.

A Safe Reinforcement Learning driven Weights-varying Model Predictive Control for Autonomous Vehicle Motion Control

TL;DR

Abstract

A Safe Reinforcement Learning driven Weights-varying Model Predictive Control for Autonomous Vehicle Motion Control

Authors

TL;DR

Abstract

Table of Contents

Figures (5)