Safe Q-learning for continuous-time linear systems

Soutrik Bandyopadhyay; Shubhendu Bhasin

Safe Q-learning for continuous-time linear systems

Soutrik Bandyopadhyay, Shubhendu Bhasin

TL;DR

This work addresses safe learning for continuous-time control of uncertain LTI systems by extending Q-learning with reciprocal control barrier functions to enforce user-defined state constraints. It casts safe learning as a constrained Q-learning problem, derives a safe policy via a Lagrangian, and implements a practical certainty-equivalence controller using online estimates and a constant safety gain $k_{sb}$. An online actor-critic (integral RL) scheme learns the Q-function and policy while guaranteeing forward invariance of the safe set and uniform ultimate boundedness of weights and states, under a PE condition. Simulation confirms safe regulation and illustrates a trade-off between safety strength and control effort, illustrating the method’s potential for real-time safety-critical applications without explicit system identification.

Abstract

Q-learning is a promising method for solving optimal control problems for uncertain systems without the explicit need for system identification. However, approaches for continuous-time Q-learning have limited provable safety guarantees, which restrict their applicability to real-time safety-critical systems. This paper proposes a safe Q-learning algorithm for partially unknown linear time-invariant systems to solve the linear quadratic regulator problem with user-defined state constraints. We frame the safe Q-learning problem as a constrained optimal control problem using reciprocal control barrier functions and show that such an extension provides a safety-assured control policy. To the best of our knowledge, Q-learning for continuous-time systems with state constraints has not yet been reported in the literature.

Safe Q-learning for continuous-time linear systems

TL;DR

. An online actor-critic (integral RL) scheme learns the Q-function and policy while guaranteeing forward invariance of the safe set and uniform ultimate boundedness of weights and states, under a PE condition. Simulation confirms safe regulation and illustrates a trade-off between safety strength and control effort, illustrating the method’s potential for real-time safety-critical applications without explicit system identification.

Abstract

Paper Structure (12 sections, 1 theorem, 34 equations, 2 figures, 1 table)

This paper contains 12 sections, 1 theorem, 34 equations, 2 figures, 1 table.

Introduction
Contributions
Mathematical notations used
Problem Formulation and Preliminaries
Unconstrained optimal control
Continuous-time Q-learning
Control barrier functions
Safe Q-Learning
Actor-Critic based online learning
Safety and Stability Analysis
Simulation Results
Conclusions

Key Result

Theorem 1

For the system in eq:plant and under the critic and actor update laws in eq:criticUpdate and eq:actorUpdate respectively, the control law in eq:finalSafeControlLaw ensures that the state $x$, the actor and critic weight estimation errors ($\tilde{W}_{a}$ and $\tilde{W}_{c}$) are uniformly ultimately

Figures (2)

Figure 1: (a) State Trajectory for the proposed algorithm (b) Estimated actor weights ($\hat{W}_{a}$) compared with the true control gains ($W_{a}$) (c) Plot of the norm of state $x$ compared with algorithm from vamvoudakis2017SysConLet.
Figure 2: The plot of the norm of the state $x$ for the proposed controller under different values of $k_{sb}$.

Theorems & Definitions (6)

Definition 1: Reciprocal control barrier function ames2016TAC
Remark 1
Remark 2
Theorem 1
proof
Remark 3

Safe Q-learning for continuous-time linear systems

TL;DR

Abstract

Safe Q-learning for continuous-time linear systems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (6)