Table of Contents
Fetching ...

Certificated Actor-Critic: Hierarchical Reinforcement Learning with Control Barrier Functions for Safe Navigation

Junjun Xie, Shuhao Zhao, Liang Hu, Huijun Gao

TL;DR

This work tackles safe robot navigation in model-free settings by integrating Control Barrier Functions (CBFs) with hierarchical reinforcement learning. The proposed Certificated Actor-Critic (CAC) learns safety first using a CBF-derived reward $r_1$ and a safety critic $V_1^\pi$, then performs a restricted policy update to improve goal-reaching while preserving safety, guided by a second-stage reward $r_2$ and a combined gradient constraint. The safety certificates $V_1^\pi$ and $Q_1^\pi$ quantify safety and guarantee that safe policies remain safe under updates, with enhancements including KL-based policy improvements, exponential reward normalization, and gradient alignment constraints. Empirical validation on a Continuous CartPole task and an Autonomous Underwater Vehicle (AUV) navigation scenario demonstrates robust safety guarantees and improved navigation performance, supported by ablation studies showing the value of restricted updates. Overall, CAC offers a principled, model-free pathway to safe navigation with quantitative safety certification and practical improvements for stable learning.

Abstract

Control Barrier Functions (CBFs) have emerged as a prominent approach to designing safe navigation systems of robots. Despite their popularity, current CBF-based methods exhibit some limitations: optimization-based safe control techniques tend to be either myopic or computationally intensive, and they rely on simplified system models; conversely, the learning-based methods suffer from the lack of quantitative indication in terms of navigation performance and safety. In this paper, we present a new model-free reinforcement learning algorithm called Certificated Actor-Critic (CAC), which introduces a hierarchical reinforcement learning framework and well-defined reward functions derived from CBFs. We carry out theoretical analysis and proof of our algorithm, and propose several improvements in algorithm implementation. Our analysis is validated by two simulation experiments, showing the effectiveness of our proposed CAC algorithm.

Certificated Actor-Critic: Hierarchical Reinforcement Learning with Control Barrier Functions for Safe Navigation

TL;DR

This work tackles safe robot navigation in model-free settings by integrating Control Barrier Functions (CBFs) with hierarchical reinforcement learning. The proposed Certificated Actor-Critic (CAC) learns safety first using a CBF-derived reward and a safety critic , then performs a restricted policy update to improve goal-reaching while preserving safety, guided by a second-stage reward and a combined gradient constraint. The safety certificates and quantify safety and guarantee that safe policies remain safe under updates, with enhancements including KL-based policy improvements, exponential reward normalization, and gradient alignment constraints. Empirical validation on a Continuous CartPole task and an Autonomous Underwater Vehicle (AUV) navigation scenario demonstrates robust safety guarantees and improved navigation performance, supported by ablation studies showing the value of restricted updates. Overall, CAC offers a principled, model-free pathway to safe navigation with quantitative safety certification and practical improvements for stable learning.

Abstract

Control Barrier Functions (CBFs) have emerged as a prominent approach to designing safe navigation systems of robots. Despite their popularity, current CBF-based methods exhibit some limitations: optimization-based safe control techniques tend to be either myopic or computationally intensive, and they rely on simplified system models; conversely, the learning-based methods suffer from the lack of quantitative indication in terms of navigation performance and safety. In this paper, we present a new model-free reinforcement learning algorithm called Certificated Actor-Critic (CAC), which introduces a hierarchical reinforcement learning framework and well-defined reward functions derived from CBFs. We carry out theoretical analysis and proof of our algorithm, and propose several improvements in algorithm implementation. Our analysis is validated by two simulation experiments, showing the effectiveness of our proposed CAC algorithm.

Paper Structure

This paper contains 18 sections, 1 theorem, 13 equations, 9 figures, 2 tables, 1 algorithm.

Key Result

Lemma 1

(Forward Invariant) The system eq_discretesystem is forward invariant in safe set $\mathcal{C}$, i.e., if

Figures (9)

  • Figure 1: The framework of certificated actor-critic.
  • Figure 2: The relationship between safe policy and its parameters under \ref{['asp_parameter-continuty']}. Since the corresponding parameters of similar safe policies are continuous, parameters can be updated in a small neighbourhood, which guarantees the safety of policy.
  • Figure 3: Frames in an episode with $\pi_\text{safe}^*$ and $\pi^*$.
  • Figure 4: Positions and angles of CartPole with $\pi_\text{safe}^*$ and $\pi^*$ in 10 test episodes. Both policies guarantee the state in the safe range, and the final policy $\pi^*$ drives the CartPole to the target position in safe conditions.
  • Figure 5: Heatmaps of (a) safety critic value, (b) average sampling return and (c) safe rate. Three heatmaps are similar and consistent, which validates that the safety critic is a good safety certificate.
  • ...and 4 more figures

Theorems & Definitions (3)

  • Lemma 1
  • proof
  • proof