Safe Q-learning for continuous-time linear systems
Soutrik Bandyopadhyay, Shubhendu Bhasin
TL;DR
This work addresses safe learning for continuous-time control of uncertain LTI systems by extending Q-learning with reciprocal control barrier functions to enforce user-defined state constraints. It casts safe learning as a constrained Q-learning problem, derives a safe policy via a Lagrangian, and implements a practical certainty-equivalence controller using online estimates and a constant safety gain $k_{sb}$. An online actor-critic (integral RL) scheme learns the Q-function and policy while guaranteeing forward invariance of the safe set and uniform ultimate boundedness of weights and states, under a PE condition. Simulation confirms safe regulation and illustrates a trade-off between safety strength and control effort, illustrating the method’s potential for real-time safety-critical applications without explicit system identification.
Abstract
Q-learning is a promising method for solving optimal control problems for uncertain systems without the explicit need for system identification. However, approaches for continuous-time Q-learning have limited provable safety guarantees, which restrict their applicability to real-time safety-critical systems. This paper proposes a safe Q-learning algorithm for partially unknown linear time-invariant systems to solve the linear quadratic regulator problem with user-defined state constraints. We frame the safe Q-learning problem as a constrained optimal control problem using reciprocal control barrier functions and show that such an extension provides a safety-assured control policy. To the best of our knowledge, Q-learning for continuous-time systems with state constraints has not yet been reported in the literature.
