Table of Contents
Fetching ...

Safe Q-learning for continuous-time linear systems

Soutrik Bandyopadhyay, Shubhendu Bhasin

TL;DR

This work addresses safe learning for continuous-time control of uncertain LTI systems by extending Q-learning with reciprocal control barrier functions to enforce user-defined state constraints. It casts safe learning as a constrained Q-learning problem, derives a safe policy via a Lagrangian, and implements a practical certainty-equivalence controller using online estimates and a constant safety gain $k_{sb}$. An online actor-critic (integral RL) scheme learns the Q-function and policy while guaranteeing forward invariance of the safe set and uniform ultimate boundedness of weights and states, under a PE condition. Simulation confirms safe regulation and illustrates a trade-off between safety strength and control effort, illustrating the method’s potential for real-time safety-critical applications without explicit system identification.

Abstract

Q-learning is a promising method for solving optimal control problems for uncertain systems without the explicit need for system identification. However, approaches for continuous-time Q-learning have limited provable safety guarantees, which restrict their applicability to real-time safety-critical systems. This paper proposes a safe Q-learning algorithm for partially unknown linear time-invariant systems to solve the linear quadratic regulator problem with user-defined state constraints. We frame the safe Q-learning problem as a constrained optimal control problem using reciprocal control barrier functions and show that such an extension provides a safety-assured control policy. To the best of our knowledge, Q-learning for continuous-time systems with state constraints has not yet been reported in the literature.

Safe Q-learning for continuous-time linear systems

TL;DR

This work addresses safe learning for continuous-time control of uncertain LTI systems by extending Q-learning with reciprocal control barrier functions to enforce user-defined state constraints. It casts safe learning as a constrained Q-learning problem, derives a safe policy via a Lagrangian, and implements a practical certainty-equivalence controller using online estimates and a constant safety gain . An online actor-critic (integral RL) scheme learns the Q-function and policy while guaranteeing forward invariance of the safe set and uniform ultimate boundedness of weights and states, under a PE condition. Simulation confirms safe regulation and illustrates a trade-off between safety strength and control effort, illustrating the method’s potential for real-time safety-critical applications without explicit system identification.

Abstract

Q-learning is a promising method for solving optimal control problems for uncertain systems without the explicit need for system identification. However, approaches for continuous-time Q-learning have limited provable safety guarantees, which restrict their applicability to real-time safety-critical systems. This paper proposes a safe Q-learning algorithm for partially unknown linear time-invariant systems to solve the linear quadratic regulator problem with user-defined state constraints. We frame the safe Q-learning problem as a constrained optimal control problem using reciprocal control barrier functions and show that such an extension provides a safety-assured control policy. To the best of our knowledge, Q-learning for continuous-time systems with state constraints has not yet been reported in the literature.
Paper Structure (12 sections, 1 theorem, 34 equations, 2 figures, 1 table)

This paper contains 12 sections, 1 theorem, 34 equations, 2 figures, 1 table.

Key Result

Theorem 1

For the system in eq:plant and under the critic and actor update laws in eq:criticUpdate and eq:actorUpdate respectively, the control law in eq:finalSafeControlLaw ensures that the state $x$, the actor and critic weight estimation errors ($\tilde{W}_{a}$ and $\tilde{W}_{c}$) are uniformly ultimately

Figures (2)

  • Figure 1: (a) State Trajectory for the proposed algorithm (b) Estimated actor weights ($\hat{W}_{a}$) compared with the true control gains ($W_{a}$) (c) Plot of the norm of state $x$ compared with algorithm from vamvoudakis2017SysConLet.
  • Figure 2: The plot of the norm of the state $x$ for the proposed controller under different values of $k_{sb}$.

Theorems & Definitions (6)

  • Definition 1: Reciprocal control barrier function ames2016TAC
  • Remark 1
  • Remark 2
  • Theorem 1
  • proof
  • Remark 3