Table of Contents
Fetching ...

Convergence of policy gradient methods for finite-horizon exploratory linear-quadratic control problems

Michael Giegrich, Christoph Reisinger, Yufei Zhang

TL;DR

This work considers a continuous-time Gaussian policy whose mean is linear in the state variable and whose covariance is state-independent, and proposes geometry-aware gradient descents for the mean and covariance using the Fisher geometry and the Bures-Wasserstein geometry.

Abstract

We study the global linear convergence of policy gradient (PG) methods for finite-horizon continuous-time exploratory linear-quadratic control (LQC) problems. The setting includes stochastic LQC problems with indefinite costs and allows additional entropy regularisers in the objective. We consider a continuous-time Gaussian policy whose mean is linear in the state variable and whose covariance is state-independent. Contrary to discrete-time problems, the cost is noncoercive in the policy and not all descent directions lead to bounded iterates. We propose geometry-aware gradient descents for the mean and covariance of the policy using the Fisher geometry and the Bures-Wasserstein geometry, respectively. The policy iterates are shown to satisfy an a-priori bound, and converge globally to the optimal policy with a linear rate. We further propose a novel PG method with discrete-time policies. The algorithm leverages the continuous-time analysis, and achieves a robust linear convergence across different action frequencies. A numerical experiment confirms the convergence and robustness of the proposed algorithm.

Convergence of policy gradient methods for finite-horizon exploratory linear-quadratic control problems

TL;DR

This work considers a continuous-time Gaussian policy whose mean is linear in the state variable and whose covariance is state-independent, and proposes geometry-aware gradient descents for the mean and covariance using the Fisher geometry and the Bures-Wasserstein geometry.

Abstract

We study the global linear convergence of policy gradient (PG) methods for finite-horizon continuous-time exploratory linear-quadratic control (LQC) problems. The setting includes stochastic LQC problems with indefinite costs and allows additional entropy regularisers in the objective. We consider a continuous-time Gaussian policy whose mean is linear in the state variable and whose covariance is state-independent. Contrary to discrete-time problems, the cost is noncoercive in the policy and not all descent directions lead to bounded iterates. We propose geometry-aware gradient descents for the mean and covariance of the policy using the Fisher geometry and the Bures-Wasserstein geometry, respectively. The policy iterates are shown to satisfy an a-priori bound, and converge globally to the optimal policy with a linear rate. We further propose a novel PG method with discrete-time policies. The algorithm leverages the continuous-time analysis, and achieves a robust linear convergence across different action frequencies. A numerical experiment confirms the convergence and robustness of the proposed algorithm.
Paper Structure (27 sections, 22 theorems, 131 equations, 1 figure)

This paper contains 27 sections, 22 theorems, 131 equations, 1 figure.

Key Result

Proposition 2.1

Suppose (H.assum:coefficident) holds. For each $\theta\in \Theta$, let $P^\theta\in C([0,T];{\mathbb{S}}^d)$ satisfy eq:lyapunov_reg, and let $\Sigma^\theta\in C([0,T]; \overline{{\mathbb{S}}^d_{+}})$ satisfy eq:lq_sde_K_reg_cov. Then for all $\theta, \theta'\in \Theta$, where for a.e. $t\in [0,T]$,

Figures (1)

  • Figure 1: Convergence and robustness of the PG method \ref{['eq:NPG_discrete_finite']}.

Theorems & Definitions (46)

  • Remark 2.1
  • Remark 2.2
  • Proposition 2.1
  • Proposition 2.2
  • Proposition 2.3
  • Proposition 2.4
  • Proposition 2.5
  • Remark 2.3: Implicit regularisation
  • Theorem 2.6
  • Theorem 2.7
  • ...and 36 more