Learning Control Barrier Functions and their application in Reinforcement Learning: A Survey

Maeva Guerrier; Hassan Fouad; Giovanni Beltrame

Learning Control Barrier Functions and their application in Reinforcement Learning: A Survey

Maeva Guerrier, Hassan Fouad, Giovanni Beltrame

TL;DR

This survey tackles the problem of safe reinforcement learning for robotics by focusing on Control Barrier Functions (CBFs) as a principled tool to enforce forward invariance of safe state sets during learning and deployment. It surveys soft, hard, and probabilistic safety constraints in SRL, and systematically reviews how CBFs can be constructed or learned from data, including demonstrations, inverse RL, and priors, as well as safety-filter/shield architectures and GP-based uncertainty handling. The contributions include a comprehensive taxonomy of SRL approaches using CBFs, analysis of data-driven CBF construction methods, and discussion of practical challenges such as sim2real transfer, generalization, and deployment certification. The work highlights that CBF-based safety can improve sample efficiency and safety in RL, while underscoring the need for robust, transferable, and less conservative methods to bridge the gap to real-world, lifelong robotic systems.

Abstract

Reinforcement learning is a powerful technique for developing new robot behaviors. However, typical lack of safety guarantees constitutes a hurdle for its practical application on real robots. To address this issue, safe reinforcement learning aims to incorporate safety considerations, enabling faster transfer to real robots and facilitating lifelong learning. One promising approach within safe reinforcement learning is the use of control barrier functions. These functions provide a framework to ensure that the system remains in a safe state during the learning process. However, synthesizing control barrier functions is not straightforward and often requires ample domain knowledge. This challenge motivates the exploration of data-driven methods for automatically defining control barrier functions, which is highly appealing. We conduct a comprehensive review of the existing literature on safe reinforcement learning using control barrier functions. Additionally, we investigate various techniques for automatically learning the Control Barrier Functions, aiming to enhance the safety and efficacy of Reinforcement Learning in practical robot applications.

Learning Control Barrier Functions and their application in Reinforcement Learning: A Survey

TL;DR

Abstract

Paper Structure (26 sections, 13 equations, 3 figures, 3 tables)

This paper contains 26 sections, 13 equations, 3 figures, 3 tables.

INTRODUCTION
Background
Safety categorization in RL
CBF methods in RL
CONTRIBUTIONS
ORGANIZATION
PRELIMINARIES
Control Barrier Functions
Reinforcement learning
SAFE REINFORCEMENT LEARNING
SOFT CONSTRAINTS METHODS
HARD and PROBABILISTIC CONSTRAINTS METHODS
Using safety shields
USING CBF SAFETY FILTERS
CBF CONSTRUCTION
...and 11 more sections

Figures (3)

Figure 1: Illustrative depiction of model-based, data-driven and the combinaison of both approaches. Model-based approaches are robust, however, they are rigid and not flexible to unseen scenarios. Whereas, data-driven scenarios are adaptive to unforeseen cases. However, they are not bounded against unwanted cases. The combination of both methods allow to both have the flexibility of data-driven method while being able to enforce properties such as safety criteria.
Figure 2: In reinforcement learning, an agent interacts with the environment. In a given state, the agent carries out an action and learns from the world by receiving a reward that is dependent on the previous state and the proposed action. The agent goes from one state to another with a state-transition probability that is unknown in real case scenarios.
Figure 3: An illustrative example of how a filter function. The agent interact with the environment by proposing an action at a given state and shift to the next state given an unknown state-transition probability. The action is monitored by the filter while training and on deployment. The filter maps unsafe actions to safe ones to maintain system safety. If the filter has to change the action, the agent is aware of that fact and learns form its mistake, otherwise, it directly receives the reward from the environment.

Theorems & Definitions (1)

Definition 2.1

Learning Control Barrier Functions and their application in Reinforcement Learning: A Survey

TL;DR

Abstract

Learning Control Barrier Functions and their application in Reinforcement Learning: A Survey

Authors

TL;DR

Abstract

Table of Contents

Figures (3)

Theorems & Definitions (1)