Convergence of policy gradient methods for finite-horizon exploratory linear-quadratic control problems

Michael Giegrich; Christoph Reisinger; Yufei Zhang

Convergence of policy gradient methods for finite-horizon exploratory linear-quadratic control problems

Michael Giegrich, Christoph Reisinger, Yufei Zhang

TL;DR

This work considers a continuous-time Gaussian policy whose mean is linear in the state variable and whose covariance is state-independent, and proposes geometry-aware gradient descents for the mean and covariance using the Fisher geometry and the Bures-Wasserstein geometry.

Abstract

We study the global linear convergence of policy gradient (PG) methods for finite-horizon continuous-time exploratory linear-quadratic control (LQC) problems. The setting includes stochastic LQC problems with indefinite costs and allows additional entropy regularisers in the objective. We consider a continuous-time Gaussian policy whose mean is linear in the state variable and whose covariance is state-independent. Contrary to discrete-time problems, the cost is noncoercive in the policy and not all descent directions lead to bounded iterates. We propose geometry-aware gradient descents for the mean and covariance of the policy using the Fisher geometry and the Bures-Wasserstein geometry, respectively. The policy iterates are shown to satisfy an a-priori bound, and converge globally to the optimal policy with a linear rate. We further propose a novel PG method with discrete-time policies. The algorithm leverages the continuous-time analysis, and achieves a robust linear convergence across different action frequencies. A numerical experiment confirms the convergence and robustness of the proposed algorithm.

Convergence of policy gradient methods for finite-horizon exploratory linear-quadratic control problems

TL;DR

Abstract

Paper Structure (27 sections, 22 theorems, 131 equations, 1 figure)

This paper contains 27 sections, 22 theorems, 131 equations, 1 figure.

Introduction
Issues and challenges from continuous-time models.
Our contributions.
Our approaches and related works.
Notation.
Problem formulation and main results
Regularised stochastic LQ control problems with indefinite costs
Optimisation over Gaussian policies and landscape analysis
Policy optimisation.
Optimisation landscape.
Policy gradient method and its convergence analysis
Geometry-aware policy gradient method.
Convergence analysis.
Mesh-independent linear convergence with discrete-time policies
Proofs
...and 12 more sections

Key Result

Proposition 2.1

Suppose (H.assum:coefficident) holds. For each $\theta\in \Theta$, let $P^\theta\in C([0,T];{\mathbb{S}}^d)$ satisfy eq:lyapunov_reg, and let $\Sigma^\theta\in C([0,T]; \overline{{\mathbb{S}}^d_{+}})$ satisfy eq:lq_sde_K_reg_cov. Then for all $\theta, \theta'\in \Theta$, where for a.e. $t\in [0,T]$,

Figures (1)

Figure 1: Convergence and robustness of the PG method \ref{['eq:NPG_discrete_finite']}.

Theorems & Definitions (46)

Remark 2.1
Remark 2.2
Proposition 2.1
Proposition 2.2
Proposition 2.3
Proposition 2.4
Proposition 2.5
Remark 2.3: Implicit regularisation
Theorem 2.6
Theorem 2.7
...and 36 more

Convergence of policy gradient methods for finite-horizon exploratory linear-quadratic control problems

TL;DR

Abstract

Convergence of policy gradient methods for finite-horizon exploratory linear-quadratic control problems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (46)