Certificated Actor-Critic: Hierarchical Reinforcement Learning with Control Barrier Functions for Safe Navigation
Junjun Xie, Shuhao Zhao, Liang Hu, Huijun Gao
TL;DR
This work tackles safe robot navigation in model-free settings by integrating Control Barrier Functions (CBFs) with hierarchical reinforcement learning. The proposed Certificated Actor-Critic (CAC) learns safety first using a CBF-derived reward $r_1$ and a safety critic $V_1^\pi$, then performs a restricted policy update to improve goal-reaching while preserving safety, guided by a second-stage reward $r_2$ and a combined gradient constraint. The safety certificates $V_1^\pi$ and $Q_1^\pi$ quantify safety and guarantee that safe policies remain safe under updates, with enhancements including KL-based policy improvements, exponential reward normalization, and gradient alignment constraints. Empirical validation on a Continuous CartPole task and an Autonomous Underwater Vehicle (AUV) navigation scenario demonstrates robust safety guarantees and improved navigation performance, supported by ablation studies showing the value of restricted updates. Overall, CAC offers a principled, model-free pathway to safe navigation with quantitative safety certification and practical improvements for stable learning.
Abstract
Control Barrier Functions (CBFs) have emerged as a prominent approach to designing safe navigation systems of robots. Despite their popularity, current CBF-based methods exhibit some limitations: optimization-based safe control techniques tend to be either myopic or computationally intensive, and they rely on simplified system models; conversely, the learning-based methods suffer from the lack of quantitative indication in terms of navigation performance and safety. In this paper, we present a new model-free reinforcement learning algorithm called Certificated Actor-Critic (CAC), which introduces a hierarchical reinforcement learning framework and well-defined reward functions derived from CBFs. We carry out theoretical analysis and proof of our algorithm, and propose several improvements in algorithm implementation. Our analysis is validated by two simulation experiments, showing the effectiveness of our proposed CAC algorithm.
