A Review On Safe Reinforcement Learning Using Lyapunov and Barrier Functions
Dhruv Singh Kushwaha, Zoleikha Abdollahi Biron
TL;DR
The paper addresses the gap in safe reinforcement learning by surveying how Lyapunov and barrier functions can certify stability and enforce constraints for RL policies during both training and deployment. It categorizes approaches into Lyapunov-based methods (certificates, safe-set expansion, CMDP constraints, learning signals, physics-informed structures, and extensions) and barrier-based approaches (barrier shaping, CBF shielding, constrained optimization, and multi-agent/disturbance-aware methods), plus combined CLF-CBF frameworks and formal verification. Key contributions include a structured taxonomy of methods, critical discussion of their strengths and limitations, and a roadmap of open problems such as certificate validity under function approximation, feasibility challenges, and scalable verification. The work emphasizes the practical trade-offs between strict safety guarantees and learning efficiency, highlighting directions to bridge theory and real-world deployment in robotics, CPS, and autonomous systems.
Abstract
Reinforcement learning (RL) has proven to be particularly effective in solving complex decision-making problems for a wide range of applications. From a control theory perspective, RL can be considered as an adaptive optimal control scheme. Lyapunov and barrier functions are the most commonly used certificates to guarantee system stability for a proposed/derived controller and constraint satisfaction guarantees, respectively, in control theoretic approaches. However, compared to theoretical guarantees available in control theoretic methods, RL lacks closed-loop stability of a computed policy and constraint satisfaction guarantees. Safe reinforcement learning refers to a class of constrained problems where the constraint violations lead to partial or complete system failure. The goal of this review is to provide an overview of safe RL techniques using Lyapunov and barrier functions to guarantee this notion of safety discussed (stability of the system in terms of a computed policy and constraint satisfaction during training and deployment). The different approaches employed are discussed in detail along with their shortcomings and benefits to provide critique and possible future research directions. Key motivation for this review is to discuss current theoretical approaches for safety and stability guarantees in RL similar to control theoretic approaches using Lyapunov and barrier functions. The review provides proven potential and promising scope of providing safety guarantees for complex dynamical systems with operational constraints using model-based and model-free RL.
