Reinforcement Learning-based Receding Horizon Control using Adaptive Control Barrier Functions for Safety-Critical Systems
Ehsan Sabouni, H. M. Sabbir Ahmad, Vittorio Giammarino, Christos G. Cassandras, Ioannis Ch. Paschalidis, Wenchao Li
TL;DR
This work addresses safety-critical control by marrying adaptive Control Barrier Functions (CBFs) with Model Predictive Control (MPC) in a Receding Horizon framework. By parameterizing both the MPC objective and the CBF/CLF constraints and learning these parameters via reinforcement learning, the approach balances safety with performance without backpropagating through the MPC-CBF solver. Applied to multi-vehicle merging for Connected and Automated Vehicles, the method demonstrates substantial reductions in infeasibility (approximately 65%) and improved efficiency metrics, while preserving safety through high-order CBF guarantees. The proposed bilevel RL-MPC-CBF framework enables scalable, generalizable control in safety-critical, time-constrained settings and opens avenues for extending to mixed-traffic and more complex multi-agent scenarios.
Abstract
Optimal control methods provide solutions to safety-critical problems but easily become intractable. Control Barrier Functions (CBFs) have emerged as a popular technique that facilitates their solution by provably guaranteeing safety, through their forward invariance property, at the expense of some performance loss. This approach involves defining a performance objective alongside CBF-based safety constraints that must always be enforced. Unfortunately, both performance and solution feasibility can be significantly impacted by two key factors: (i) the selection of the cost function and associated parameters, and (ii) the calibration of parameters within the CBF-based constraints, which capture the trade-off between performance and conservativeness. %as well as infeasibility. To address these challenges, we propose a Reinforcement Learning (RL)-based Receding Horizon Control (RHC) approach leveraging Model Predictive Control (MPC) with CBFs (MPC-CBF). In particular, we parameterize our controller and use bilevel optimization, where RL is used to learn the optimal parameters while MPC computes the optimal control input. We validate our method by applying it to the challenging automated merging control problem for Connected and Automated Vehicles (CAVs) at conflicting roadways. Results demonstrate improved performance and a significant reduction in the number of infeasible cases compared to traditional heuristic approaches used for tuning CBF-based controllers, showcasing the effectiveness of the proposed method.
