Table of Contents
Fetching ...

Optimal Output Feedback Learning Control for Discrete-Time Linear Quadratic Regulation

Kedi Xie, Martin Guay, Shimin Wang, Fang Deng, Maobin Lu

TL;DR

This work tackles the discrete-time LQR problem for systems with unknown matrices and unmeasured states by proposing a generalized dynamic output feedback controller that is equivalent to full-state feedback and supports off-policy learning. It develops two model-free ADP-based learning schemes, VI and PI, to estimate the optimal feedback gain using only input-output data, and introduces a model-free stability criterion along with a switched-iteration mechanism to combine the strengths of VI and PI. Convergence, stability, and optimality analyses are provided via rank conditions on the parameterization and data regressors, ensuring the learned policy approaches the optimal $K^*$ despite observer errors. Theoretical results are validated through two numerical examples, including an aircraft system, demonstrating robust performance and faster convergence with the SI framework. Overall, the approach enables guaranteed-convergence, model-free optimal control for unknown discrete-time LQR problems with unmeasurable states, broadening applicability in uncertain environments.

Abstract

This paper studies the linear quadratic regulation (LQR) problem of unknown discrete-time systems via dynamic output feedback learning control. In contrast to the state feedback, the optimality of the dynamic output feedback control for solving the LQR problem requires an implicit condition on the convergence of the state observer. Moreover, due to unknown system matrices and the existence of observer error, it is difficult to analyze the convergence and stability of most existing output feedback learning-based control methods. To tackle these issues, we propose a generalized dynamic output feedback learning control approach with guaranteed convergence, stability, and optimality performance for solving the LQR problem of unknown discrete-time linear systems. In particular, a dynamic output feedback controller is designed to be equivalent to a state feedback controller. This equivalence relationship is an inherent property without requiring convergence of the estimated state by the state observer, which plays a key role in establishing the off-policy learning control approaches. By value iteration and policy iteration schemes, the adaptive dynamic programming based learning control approaches are developed to estimate the optimal feedback control gain. In addition, a model-free stability criterion is provided by finding a nonsingular parameterization matrix, which contributes to establishing a switched iteration scheme. Furthermore, the convergence, stability, and optimality analyses of the proposed output feedback learning control approaches are given. Finally, the theoretical results are validated by two numerical examples.

Optimal Output Feedback Learning Control for Discrete-Time Linear Quadratic Regulation

TL;DR

This work tackles the discrete-time LQR problem for systems with unknown matrices and unmeasured states by proposing a generalized dynamic output feedback controller that is equivalent to full-state feedback and supports off-policy learning. It develops two model-free ADP-based learning schemes, VI and PI, to estimate the optimal feedback gain using only input-output data, and introduces a model-free stability criterion along with a switched-iteration mechanism to combine the strengths of VI and PI. Convergence, stability, and optimality analyses are provided via rank conditions on the parameterization and data regressors, ensuring the learned policy approaches the optimal despite observer errors. Theoretical results are validated through two numerical examples, including an aircraft system, demonstrating robust performance and faster convergence with the SI framework. Overall, the approach enables guaranteed-convergence, model-free optimal control for unknown discrete-time LQR problems with unmeasurable states, broadening applicability in uncertain environments.

Abstract

This paper studies the linear quadratic regulation (LQR) problem of unknown discrete-time systems via dynamic output feedback learning control. In contrast to the state feedback, the optimality of the dynamic output feedback control for solving the LQR problem requires an implicit condition on the convergence of the state observer. Moreover, due to unknown system matrices and the existence of observer error, it is difficult to analyze the convergence and stability of most existing output feedback learning-based control methods. To tackle these issues, we propose a generalized dynamic output feedback learning control approach with guaranteed convergence, stability, and optimality performance for solving the LQR problem of unknown discrete-time linear systems. In particular, a dynamic output feedback controller is designed to be equivalent to a state feedback controller. This equivalence relationship is an inherent property without requiring convergence of the estimated state by the state observer, which plays a key role in establishing the off-policy learning control approaches. By value iteration and policy iteration schemes, the adaptive dynamic programming based learning control approaches are developed to estimate the optimal feedback control gain. In addition, a model-free stability criterion is provided by finding a nonsingular parameterization matrix, which contributes to establishing a switched iteration scheme. Furthermore, the convergence, stability, and optimality analyses of the proposed output feedback learning control approaches are given. Finally, the theoretical results are validated by two numerical examples.

Paper Structure

This paper contains 18 sections, 12 theorems, 114 equations, 3 figures, 5 tables.

Key Result

Theorem 1

Under Assumption ass_1, there exists a dynamic output feedback controller that is equal to a static state feedback controller $u(k)=-Kx(k)$, $\forall k = 0, 1, 2, \dots$, where $\eta \in \mathbb{R}^{n(m+p+1)}$ is the state of the internal model controller MF z with a user-defined matrix pair $(\mathcal{G}_1, \mathcal{G}_2)$, $\mathcal{K}=KM$ is the control gain with a par

Figures (3)

  • Figure 1: The convergence of Algorithm 3.
  • Figure 4: The convergence of VI-based Algorithm 3.
  • Figure 7: The evolution of input and output.

Theorems & Definitions (17)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Remark 1
  • Lemma 1
  • Remark 2
  • Remark 3
  • Theorem 4
  • Theorem 5
  • Remark 4
  • ...and 7 more