An efficient data-based off-policy Q-learning algorithm for optimal output feedback control of linear systems

Mohammad Alsalti; Victor G. Lopez; Matthias A. Müller

An efficient data-based off-policy Q-learning algorithm for optimal output feedback control of linear systems

Mohammad Alsalti, Victor G. Lopez, Matthias A. Müller

TL;DR

The paper tackles optimal output regulation for unknown discrete-time LTI systems using only offline input-output data. It extends off-policy Q-learning to the output-feedback setting by introducing a non-minimal state $z_k$ derived from past inputs and outputs, and derives a data-driven generalized Sylvester equation to update the Q-function parameters, ensuring quadratic convergence to the optimal output-feedback gain $K_z^*$. The method requires a persistently exciting input and provides a data-based initialization of a stabilizing policy. Empirical results show the approach is computationally efficient and more scalable than SDP-based alternatives, achieving faster convergence with smaller errors on large-dimensional systems.

Abstract

In this paper, we present a Q-learning algorithm to solve the optimal output regulation problem for discrete-time LTI systems. This off-policy algorithm only relies on using persistently exciting input-output data, measured offline. No model knowledge or state measurements are needed and the obtained optimal policy only uses past input-output information. Moreover, our formulation of the proposed algorithm renders it computationally efficient. We provide conditions that guarantee the convergence of the algorithm to the optimal solution. Finally, the performance of our method is compared to existing algorithms in the literature.

An efficient data-based off-policy Q-learning algorithm for optimal output feedback control of linear systems

TL;DR

derived from past inputs and outputs, and derives a data-driven generalized Sylvester equation to update the Q-function parameters, ensuring quadratic convergence to the optimal output-feedback gain

. The method requires a persistently exciting input and provides a data-based initialization of a stabilizing policy. Empirical results show the approach is computationally efficient and more scalable than SDP-based alternatives, achieving faster convergence with smaller errors on large-dimensional systems.

Abstract

Paper Structure (6 sections, 3 theorems, 28 equations, 1 table, 1 algorithm)

This paper contains 6 sections, 3 theorems, 28 equations, 1 table, 1 algorithm.

Introduction
Problem formulation
Q-learning algorithm for the LQR problem
Q-learning algorithm for the optimal output regulation problem
Comparisons and Simulation examples
Conclusion

Key Result

lemma 1

Let $\{u_k,y_k\}_{k=-\ell}^{N-1}$ be input-output data collected from system eqn_LTI with the input being PE of order $\ell+n+1$. Then $\textup{rank}\left(\right) = m(\ell+1)+n,$ where

Theorems & Definitions (9)

definition 1: Willems05
remark 1: Alsalti23b
lemma 1: Alsalti23b
lemma 2
proof
remark 2
theorem 1
proof
remark 3

An efficient data-based off-policy Q-learning algorithm for optimal output feedback control of linear systems

TL;DR

Abstract

An efficient data-based off-policy Q-learning algorithm for optimal output feedback control of linear systems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (9)