Data-Enabled Policy and Value Iteration for Continuous-Time Linear Quadratic Output Feedback Control

Jun Xie; Yuan-Hua Ni; Yiqin Yang; Bo Xu

Data-Enabled Policy and Value Iteration for Continuous-Time Linear Quadratic Output Feedback Control

Jun Xie, Yuan-Hua Ni, Yiqin Yang, Bo Xu

Abstract

This paper proposes efficient policy iteration and value iteration algorithms for the continuous-time linear quadratic regulator problem with unmeasurable states and unknown system dynamics, from the perspective of direct data-driven control. Specifically, by re-examining the data characteristics of input-output filtered vectors and introducing QR decomposition, an improved substitute state construction method is presented that further eliminates redundant information, ensures a full row rank data matrix, and enables a complete parameterized representation of the feedback controller. Furthermore, the original problem is transformed into an equivalent linear quadratic regulator problem defined on the substitute state with a known input matrix, verifying the stabilizability and detectability of the transformed system. Consequently, model-free policy iteration and value iteration algorithms are designed that fully exploit the full row rank substitute state data matrix. The proposed algorithms offer distinct advantages: they avoid the need for prior knowledge of the system order or the calculation of signal derivatives and integrals; the iterative equations can be solved directly without relying on the traditional least-squares paradigm, guaranteeing feasibility in both single-output and multi-output settings; and they demonstrate superior numerical stability, reduced data demand, and higher computational efficiency. Moreover, the heuristic results regarding trajectory generation for continuous-time systems are discussed, circumventing potential failure modes associated with existing approaches.

Data-Enabled Policy and Value Iteration for Continuous-Time Linear Quadratic Output Feedback Control

Abstract

Paper Structure (18 sections, 12 theorems, 69 equations, 3 figures, 2 tables, 2 algorithms)

This paper contains 18 sections, 12 theorems, 69 equations, 3 figures, 2 tables, 2 algorithms.

Introduction
Problem Formulation and Preliminaries
Problem Formulation
Model-Based State Feedback LQR
Willems' Fundamental Lemma for Discrete-Time Systems
Data Characteristics of Continuous-Time Systems
An Effective Substitution State and Equivalent LQR Problem
Data Characteristics of Effictive Substitute State
Equivalent LQR Problem
Efficient Off-Policy Value-Based Iteration Algorithms
Regularity Condition
Policy Iteration Algorithm
Value Iteration Algorithm
Further Discussion
Numerical Experiments
...and 3 more sections

Key Result

Lemma 2.5

Consider a controllable discrete-time linear time-invariant system $x_{t+1}=Ax_t+Bu_t$, $y_t=Cx_t$. If a PE input signal $\{u_t\}_{t\in\mathbb{N}}$ of order $n+N$ is applied, generating the corresponding state and output sequences $\{x_t\}_{t\in\mathbb{N}}$ and $\{y_t\}_{t\in\mathbb{N}}$, then the f

Figures (3)

Figure 1: Comparison of Minimum Singular Values of Data Matrices.
Figure 2: Iteration curve of the relative error for Algorithm \ref{['al1']}.
Figure 3: Iteration curve of the relative error for Algorithm \ref{['al2']}.

Theorems & Definitions (31)

Remark 2.3
Definition 2.4: Discrete-Time PE Efficient-Q
Lemma 2.5: Willems' Fundamental Lemma Willems
Definition 2.6: Continuous-Time PE CT-PE
Lemma 2.7: CT-PE
Theorem 3.1
Remark 3.2
Remark 3.3
Lemma 3.4
proof
...and 21 more

Data-Enabled Policy and Value Iteration for Continuous-Time Linear Quadratic Output Feedback Control

Abstract

Data-Enabled Policy and Value Iteration for Continuous-Time Linear Quadratic Output Feedback Control

Authors

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (31)