Linear Convergence of Data-Enabled Policy Optimization for Linear Quadratic Tracking

Shubo Kang; Feiran Zhao; Keyou You

Linear Convergence of Data-Enabled Policy Optimization for Linear Quadratic Tracking

Shubo Kang, Feiran Zhao, Keyou You

TL;DR

The paper addresses direct data-driven control for linear quadratic tracking using offline data, aiming to learn an optimal LQT policy without explicit system identification. It introduces a covariance-based parameterization of the LQT policy and a DeePO gradient-based update with projection, enabling direct data-driven optimization from data matrices. By relating DeePO to a policy-optimization framework with a positive-definite metric, the authors prove global linear convergence and show the DeePO solution matches the indirect, model-informed policy. A numerical experiment corroborates linear convergence and highlights the method's potential for online adaptive LQT.

Abstract

Data-enabled policy optimization (DeePO) is a newly proposed method to attack the open problem of direct adaptive LQR. In this work, we extend the DeePO framework to the linear quadratic tracking (LQT) with offline data. By introducing a covariance parameterization of the LQT policy, we derive a direct data-driven formulation of the LQT problem. Then, we use gradient descent method to iteratively update the parameterized policy to find an optimal LQT policy. Moreover, by revealing the connection between DeePO and model-based policy optimization, we prove the linear convergence of the DeePO iteration. Finally, a numerical experiment is given to validate the convergence results. We hope our work paves the way to direct adaptive LQT with online closed-loop data.

Linear Convergence of Data-Enabled Policy Optimization for Linear Quadratic Tracking

TL;DR

Abstract

Paper Structure (15 sections, 7 theorems, 56 equations, 1 figure)

This paper contains 15 sections, 7 theorems, 56 equations, 1 figure.

Introduction
Data-driven formulation of the linear quadratic tracking
The model-based LQT problem
Indirect data-driven formulation for LQT
Policy optimization approach for the LQT
Data-enabled policy optimization for LQT
Covariance parameterization for the LQT policy
The DeePO algorithm for solve \ref{['data_driven_problem']}
Global linear convergence of the DeePO algorithm
Numerical experiment
conclusion
Proofs
Proof of Lemma \ref{['theo_cost']}
Proof of Lemma \ref{['theo_gradient']}
Proof of Lemma \ref{['theo_equiv']}

Key Result

Lemma 1

For any $\xi \in \mathcal{S}$, it follows that

Figures (1)

Figure 1: Convergence of DeePO for LQT.

Theorems & Definitions (9)

Lemma 1: Cost function
Lemma 2: Policy gradient
Lemma 3
Lemma 4: Gradient domination, zhao2023global
Lemma 5: Local smoothness
proof
Lemma 6
proof
Theorem 1

Linear Convergence of Data-Enabled Policy Optimization for Linear Quadratic Tracking

TL;DR

Abstract

Linear Convergence of Data-Enabled Policy Optimization for Linear Quadratic Tracking

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (9)