Guided by Guardrails: Control Barrier Functions as Safety Instructors for Robotic Learning
Maeva Guerrier, Karthik Soma, Hassan Fouad, Giovanni Beltrame
TL;DR
This paper tackles the challenge of safety in learning-based robotics by embedding Control Barrier Functions (CBFs) as safety guardrails within reinforcement learning. It proposes three RL-CBF integration strategies—CBF Filter, CBF Reward, and CBF Decay—to enforce safety in a per-step manner while enabling goal-reaching behaviors, using a unicycle abstraction and validating via sim2real transfer to a four-wheel robot. The results show that CBF interventions can guarantee safety and support learning, with CBF Decay offering a practical curriculum that lets the agent learn to reach goals, avoid obstacles, and recover from incidents, while reducing dependence on explicit safety guidance over time. The work highlights the limitations of reward-only safety designs and points toward future directions such as preference-based RL and safety-aware training pipelines for real-world deployment. Overall, the study demonstrates that dynamics-agnostic CBF guardrails can be effectively integrated with RL to produce safe, robust robotic policies with tangible sim2real applicability.
Abstract
Safety stands as the primary obstacle preventing the widespread adoption of learning-based robotic systems in our daily lives. While reinforcement learning (RL) shows promise as an effective robot learning paradigm, conventional RL frameworks often model safety by using single scalar negative rewards with immediate episode termination, failing to capture the temporal consequences of unsafe actions (e.g., sustained collision damage). In this work, we introduce a novel approach that simulates these temporal effects by applying continuous negative rewards without episode termination. Our experiments reveal that standard RL methods struggle with this model, as the accumulated negative values in unsafe zones create learning barriers. To address this challenge, we demonstrate how Control Barrier Functions (CBFs), with their proven safety guarantees, effectively help robots avoid catastrophic regions while enhancing learning outcomes. We present three CBF-based approaches, each integrating traditional RL methods with Control Barrier Functions, guiding the agent to learn safe behavior. Our empirical analysis, conducted in both simulated environments and real-world settings using a four-wheel differential drive robot, explores the possibilities of employing these approaches for safe robotic learning.
