Table of Contents
Fetching ...

Data-Enabled Policy and Value Iteration for Continuous-Time Linear Quadratic Output Feedback Control

Jun Xie, Yuan-Hua Ni, Yiqin Yang, Bo Xu

Abstract

This paper proposes efficient policy iteration and value iteration algorithms for the continuous-time linear quadratic regulator problem with unmeasurable states and unknown system dynamics, from the perspective of direct data-driven control. Specifically, by re-examining the data characteristics of input-output filtered vectors and introducing QR decomposition, an improved substitute state construction method is presented that further eliminates redundant information, ensures a full row rank data matrix, and enables a complete parameterized representation of the feedback controller. Furthermore, the original problem is transformed into an equivalent linear quadratic regulator problem defined on the substitute state with a known input matrix, verifying the stabilizability and detectability of the transformed system. Consequently, model-free policy iteration and value iteration algorithms are designed that fully exploit the full row rank substitute state data matrix. The proposed algorithms offer distinct advantages: they avoid the need for prior knowledge of the system order or the calculation of signal derivatives and integrals; the iterative equations can be solved directly without relying on the traditional least-squares paradigm, guaranteeing feasibility in both single-output and multi-output settings; and they demonstrate superior numerical stability, reduced data demand, and higher computational efficiency. Moreover, the heuristic results regarding trajectory generation for continuous-time systems are discussed, circumventing potential failure modes associated with existing approaches.

Data-Enabled Policy and Value Iteration for Continuous-Time Linear Quadratic Output Feedback Control

Abstract

This paper proposes efficient policy iteration and value iteration algorithms for the continuous-time linear quadratic regulator problem with unmeasurable states and unknown system dynamics, from the perspective of direct data-driven control. Specifically, by re-examining the data characteristics of input-output filtered vectors and introducing QR decomposition, an improved substitute state construction method is presented that further eliminates redundant information, ensures a full row rank data matrix, and enables a complete parameterized representation of the feedback controller. Furthermore, the original problem is transformed into an equivalent linear quadratic regulator problem defined on the substitute state with a known input matrix, verifying the stabilizability and detectability of the transformed system. Consequently, model-free policy iteration and value iteration algorithms are designed that fully exploit the full row rank substitute state data matrix. The proposed algorithms offer distinct advantages: they avoid the need for prior knowledge of the system order or the calculation of signal derivatives and integrals; the iterative equations can be solved directly without relying on the traditional least-squares paradigm, guaranteeing feasibility in both single-output and multi-output settings; and they demonstrate superior numerical stability, reduced data demand, and higher computational efficiency. Moreover, the heuristic results regarding trajectory generation for continuous-time systems are discussed, circumventing potential failure modes associated with existing approaches.
Paper Structure (18 sections, 12 theorems, 69 equations, 3 figures, 2 tables, 2 algorithms)

This paper contains 18 sections, 12 theorems, 69 equations, 3 figures, 2 tables, 2 algorithms.

Key Result

Lemma 2.5

Consider a controllable discrete-time linear time-invariant system $x_{t+1}=Ax_t+Bu_t$, $y_t=Cx_t$. If a PE input signal $\{u_t\}_{t\in\mathbb{N}}$ of order $n+N$ is applied, generating the corresponding state and output sequences $\{x_t\}_{t\in\mathbb{N}}$ and $\{y_t\}_{t\in\mathbb{N}}$, then the f

Figures (3)

  • Figure 1: Comparison of Minimum Singular Values of Data Matrices.
  • Figure 2: Iteration curve of the relative error for Algorithm \ref{['al1']}.
  • Figure 3: Iteration curve of the relative error for Algorithm \ref{['al2']}.

Theorems & Definitions (31)

  • Remark 2.3
  • Definition 2.4: Discrete-Time PE Efficient-Q
  • Lemma 2.5: Willems' Fundamental Lemma Willems
  • Definition 2.6: Continuous-Time PE CT-PE
  • Lemma 2.7: CT-PE
  • Theorem 3.1
  • Remark 3.2
  • Remark 3.3
  • Lemma 3.4
  • proof
  • ...and 21 more