Reinforcement Learning-based Receding Horizon Control using Adaptive Control Barrier Functions for Safety-Critical Systems

Ehsan Sabouni; H. M. Sabbir Ahmad; Vittorio Giammarino; Christos G. Cassandras; Ioannis Ch. Paschalidis; Wenchao Li

Reinforcement Learning-based Receding Horizon Control using Adaptive Control Barrier Functions for Safety-Critical Systems

Ehsan Sabouni, H. M. Sabbir Ahmad, Vittorio Giammarino, Christos G. Cassandras, Ioannis Ch. Paschalidis, Wenchao Li

TL;DR

This work addresses safety-critical control by marrying adaptive Control Barrier Functions (CBFs) with Model Predictive Control (MPC) in a Receding Horizon framework. By parameterizing both the MPC objective and the CBF/CLF constraints and learning these parameters via reinforcement learning, the approach balances safety with performance without backpropagating through the MPC-CBF solver. Applied to multi-vehicle merging for Connected and Automated Vehicles, the method demonstrates substantial reductions in infeasibility (approximately 65%) and improved efficiency metrics, while preserving safety through high-order CBF guarantees. The proposed bilevel RL-MPC-CBF framework enables scalable, generalizable control in safety-critical, time-constrained settings and opens avenues for extending to mixed-traffic and more complex multi-agent scenarios.

Abstract

Optimal control methods provide solutions to safety-critical problems but easily become intractable. Control Barrier Functions (CBFs) have emerged as a popular technique that facilitates their solution by provably guaranteeing safety, through their forward invariance property, at the expense of some performance loss. This approach involves defining a performance objective alongside CBF-based safety constraints that must always be enforced. Unfortunately, both performance and solution feasibility can be significantly impacted by two key factors: (i) the selection of the cost function and associated parameters, and (ii) the calibration of parameters within the CBF-based constraints, which capture the trade-off between performance and conservativeness. %as well as infeasibility. To address these challenges, we propose a Reinforcement Learning (RL)-based Receding Horizon Control (RHC) approach leveraging Model Predictive Control (MPC) with CBFs (MPC-CBF). In particular, we parameterize our controller and use bilevel optimization, where RL is used to learn the optimal parameters while MPC computes the optimal control input. We validate our method by applying it to the challenging automated merging control problem for Connected and Automated Vehicles (CAVs) at conflicting roadways. Results demonstrate improved performance and a significant reduction in the number of infeasible cases compared to traditional heuristic approaches used for tuning CBF-based controllers, showcasing the effectiveness of the proposed method.

Reinforcement Learning-based Receding Horizon Control using Adaptive Control Barrier Functions for Safety-Critical Systems

TL;DR

Abstract

Paper Structure (10 sections, 1 theorem, 31 equations, 6 figures, 1 table)

This paper contains 10 sections, 1 theorem, 31 equations, 6 figures, 1 table.

Introduction
Preliminaries
Control Barrier Functions
Reinforcement Learning
Model Predictive Control
Problem Formulation
Parameterized MPC-CBF control Design
Multi-Agent Control of CAVs
SIMULATION RESULTS
CONCLUSION

Key Result

Theorem 1

Given a constraint $b(\boldsymbol{x}(t))$ with the associated sets $C_i$'s as defined in (C set), any Lipschitz continuous controller $\boldsymbol{u}(t)$, that satisfies (HOCBF) $\forall t \geq t_{0}$ renders the sets $C_i$ (including the set corresponding to the actual safety constraint $C_1$) forw

Figures (6)

Figure 1: RL training pipeline for parametrized MPC-CBF. The RL agent learns the parameters $[\boldsymbol{\theta}_{c,k} \ \boldsymbol{\theta}_{o,k} \ \boldsymbol{\theta}_{e,k}]^T$ where $\bm{\theta}_o$ is the vector of the learnable parameters of the objective, $\boldsymbol{\theta}_c$ are learnable parameters of the CLF constraint and $\bm{\theta}_e$ is the vector of weights of the penalty terms associated with the relaxation parameters of the CLF constraints. These parameters are then used in the MPC-CBF problem in \ref{['MPC-CBF']} which is optimized to compute the optimal control input.
Figure 2: The merging control problem for CAVs
Figure 3: The ellipsoid for safety
Figure 4: Illustration of the scenario used to generate rollouts during RL training.
Figure 5: Simulation results of the scenario depicted in Fig \ref{['fig:merging']} with the baseline approach. (a): A screenshot of the simulation at a point where vehicles $3$ and $4$ encounter infeasibility, as indicated by yellow and green dashes. (b): Steering angle and acceleration profiles of all vehicles, showing clearly that vehicles $3$ and $4$ violated the bound in their control inputs. (c): CBF constraint values of right boundary of the road and safe merging constraints of vehicles $4$ and $3$, respectively. As both plots are well below zero, there is an obvious violation. (d): Evolution of the class $\mathcal{K}$ function $\alpha_2(b_2(\boldsymbol{x}))$ values for the safe merging constraint of CAV 3.
...and 1 more figures

Theorems & Definitions (7)

Definition 1: Class $\mathcal{K}$ function
Definition 2
Definition 3: Control barrier function Ames_01
Definition 4: Relative degree
Definition 5: High Order CBF (HOCBF) xiao2019HOCBF
Theorem 1: Ames_01
Definition 6: Control Lyapunov function (CLF)ames2012control

Reinforcement Learning-based Receding Horizon Control using Adaptive Control Barrier Functions for Safety-Critical Systems

TL;DR

Abstract

Reinforcement Learning-based Receding Horizon Control using Adaptive Control Barrier Functions for Safety-Critical Systems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (7)