Table of Contents
Fetching ...

Reinforcement Learning-based Receding Horizon Control using Adaptive Control Barrier Functions for Safety-Critical Systems

Ehsan Sabouni, H. M. Sabbir Ahmad, Vittorio Giammarino, Christos G. Cassandras, Ioannis Ch. Paschalidis, Wenchao Li

TL;DR

This work addresses safety-critical control by marrying adaptive Control Barrier Functions (CBFs) with Model Predictive Control (MPC) in a Receding Horizon framework. By parameterizing both the MPC objective and the CBF/CLF constraints and learning these parameters via reinforcement learning, the approach balances safety with performance without backpropagating through the MPC-CBF solver. Applied to multi-vehicle merging for Connected and Automated Vehicles, the method demonstrates substantial reductions in infeasibility (approximately 65%) and improved efficiency metrics, while preserving safety through high-order CBF guarantees. The proposed bilevel RL-MPC-CBF framework enables scalable, generalizable control in safety-critical, time-constrained settings and opens avenues for extending to mixed-traffic and more complex multi-agent scenarios.

Abstract

Optimal control methods provide solutions to safety-critical problems but easily become intractable. Control Barrier Functions (CBFs) have emerged as a popular technique that facilitates their solution by provably guaranteeing safety, through their forward invariance property, at the expense of some performance loss. This approach involves defining a performance objective alongside CBF-based safety constraints that must always be enforced. Unfortunately, both performance and solution feasibility can be significantly impacted by two key factors: (i) the selection of the cost function and associated parameters, and (ii) the calibration of parameters within the CBF-based constraints, which capture the trade-off between performance and conservativeness. %as well as infeasibility. To address these challenges, we propose a Reinforcement Learning (RL)-based Receding Horizon Control (RHC) approach leveraging Model Predictive Control (MPC) with CBFs (MPC-CBF). In particular, we parameterize our controller and use bilevel optimization, where RL is used to learn the optimal parameters while MPC computes the optimal control input. We validate our method by applying it to the challenging automated merging control problem for Connected and Automated Vehicles (CAVs) at conflicting roadways. Results demonstrate improved performance and a significant reduction in the number of infeasible cases compared to traditional heuristic approaches used for tuning CBF-based controllers, showcasing the effectiveness of the proposed method.

Reinforcement Learning-based Receding Horizon Control using Adaptive Control Barrier Functions for Safety-Critical Systems

TL;DR

This work addresses safety-critical control by marrying adaptive Control Barrier Functions (CBFs) with Model Predictive Control (MPC) in a Receding Horizon framework. By parameterizing both the MPC objective and the CBF/CLF constraints and learning these parameters via reinforcement learning, the approach balances safety with performance without backpropagating through the MPC-CBF solver. Applied to multi-vehicle merging for Connected and Automated Vehicles, the method demonstrates substantial reductions in infeasibility (approximately 65%) and improved efficiency metrics, while preserving safety through high-order CBF guarantees. The proposed bilevel RL-MPC-CBF framework enables scalable, generalizable control in safety-critical, time-constrained settings and opens avenues for extending to mixed-traffic and more complex multi-agent scenarios.

Abstract

Optimal control methods provide solutions to safety-critical problems but easily become intractable. Control Barrier Functions (CBFs) have emerged as a popular technique that facilitates their solution by provably guaranteeing safety, through their forward invariance property, at the expense of some performance loss. This approach involves defining a performance objective alongside CBF-based safety constraints that must always be enforced. Unfortunately, both performance and solution feasibility can be significantly impacted by two key factors: (i) the selection of the cost function and associated parameters, and (ii) the calibration of parameters within the CBF-based constraints, which capture the trade-off between performance and conservativeness. %as well as infeasibility. To address these challenges, we propose a Reinforcement Learning (RL)-based Receding Horizon Control (RHC) approach leveraging Model Predictive Control (MPC) with CBFs (MPC-CBF). In particular, we parameterize our controller and use bilevel optimization, where RL is used to learn the optimal parameters while MPC computes the optimal control input. We validate our method by applying it to the challenging automated merging control problem for Connected and Automated Vehicles (CAVs) at conflicting roadways. Results demonstrate improved performance and a significant reduction in the number of infeasible cases compared to traditional heuristic approaches used for tuning CBF-based controllers, showcasing the effectiveness of the proposed method.
Paper Structure (10 sections, 1 theorem, 31 equations, 6 figures, 1 table)

This paper contains 10 sections, 1 theorem, 31 equations, 6 figures, 1 table.

Key Result

Theorem 1

Given a constraint $b(\boldsymbol{x}(t))$ with the associated sets $C_i$'s as defined in (C set), any Lipschitz continuous controller $\boldsymbol{u}(t)$, that satisfies (HOCBF) $\forall t \geq t_{0}$ renders the sets $C_i$ (including the set corresponding to the actual safety constraint $C_1$) forw

Figures (6)

  • Figure 1: RL training pipeline for parametrized MPC-CBF. The RL agent learns the parameters $[\boldsymbol{\theta}_{c,k} \ \boldsymbol{\theta}_{o,k} \ \boldsymbol{\theta}_{e,k}]^T$ where $\bm{\theta}_o$ is the vector of the learnable parameters of the objective, $\boldsymbol{\theta}_c$ are learnable parameters of the CLF constraint and $\bm{\theta}_e$ is the vector of weights of the penalty terms associated with the relaxation parameters of the CLF constraints. These parameters are then used in the MPC-CBF problem in \ref{['MPC-CBF']} which is optimized to compute the optimal control input.
  • Figure 2: The merging control problem for CAVs
  • Figure 3: The ellipsoid for safety
  • Figure 4: Illustration of the scenario used to generate rollouts during RL training.
  • Figure 5: Simulation results of the scenario depicted in Fig \ref{['fig:merging']} with the baseline approach. (a): A screenshot of the simulation at a point where vehicles $3$ and $4$ encounter infeasibility, as indicated by yellow and green dashes. (b): Steering angle and acceleration profiles of all vehicles, showing clearly that vehicles $3$ and $4$ violated the bound in their control inputs. (c): CBF constraint values of right boundary of the road and safe merging constraints of vehicles $4$ and $3$, respectively. As both plots are well below zero, there is an obvious violation. (d): Evolution of the class $\mathcal{K}$ function $\alpha_2(b_2(\boldsymbol{x}))$ values for the safe merging constraint of CAV 3.
  • ...and 1 more figures

Theorems & Definitions (7)

  • Definition 1: Class $\mathcal{K}$ function
  • Definition 2
  • Definition 3: Control barrier function Ames_01
  • Definition 4: Relative degree
  • Definition 5: High Order CBF (HOCBF) xiao2019HOCBF
  • Theorem 1: Ames_01
  • Definition 6: Control Lyapunov function (CLF)ames2012control