A Stochastic Gradient Descent Approach to Design Policy Gradient Methods for LQR

Bowen Song; Simon Weissmann; Mathias Staudigl; Andrea Iannelli

A Stochastic Gradient Descent Approach to Design Policy Gradient Methods for LQR

Bowen Song, Simon Weissmann, Mathias Staudigl, Andrea Iannelli

Abstract

In this work, we propose a stochastic gradient descent (SGD) framework to design data-driven policy gradient descent algorithms for the linear quadratic regulator problem. Two alternative schemes are considered to estimate the policy gradient from stochastic trajectory data: (i) an indirect online identification based approach, in which the system matrices are first estimated and subsequently used to construct the gradient, and (ii) a direct zeroth-order approach, which approximates the gradient using empirical cost evaluations. In both cases, the resulting gradient estimates are random due to stochasticity in the data, allowing us to use SGD theory to analyze the convergence of the associated policy gradient methods. A key technical step consists of modeling the gradient estimates as suitable stochastic gradient oracles, which, because of the way they are computed, are inherently based. We derive sufficient conditions under which SGD with a biased gradient oracle converges asymptotically to the optimal policy, and leverage these conditions to design the parameters of the gradient estimation schemes. Moreover, we compare the advantages and limitations of the two data-driven gradient estimators. Numerical experiments validate the effectiveness of the proposed methods.

A Stochastic Gradient Descent Approach to Design Policy Gradient Methods for LQR

Abstract

Paper Structure (33 sections, 11 theorems, 96 equations, 6 figures, 1 algorithm)

This paper contains 33 sections, 11 theorems, 96 equations, 6 figures, 1 algorithm.

Introduction
Problem setting and Preliminaries
Gradient Estimation and Gradient Oracles
Indirect Gradient Oracle
Direct Gradient Oracle
Convergence Analysis of SGD with Biased Gradient
Closing the loop between SGD and gradient estimators
Indirect Methods
Direct Methods
Comparison between the two gradient estimators
Numerics
Gradient Oracle Analysis
Indirect Method (Algorithm \ref{['Algo2']})
Direct Method (Algorithm \ref{['Algo1']})
Convergence Analysis of SGD Algorithm
...and 18 more sections

Key Result

Lemma 1

pmlr-v80-fazel18aFull Given any $J_0\geq C(K^*)$, for all $K\in \mathcal{S}(J_0)$, we have where the expressions for $b_\nabla$ and $b_K$ are given in boundedgradienteq and boundK in Appendix DetailedExpression, respectively.

Figures (6)

Figure 1: Data-driven policy gradient descent framework
Figure 2: Indirect Gradient Estimation
Figure 3: Direct Gradient Estimation with Different $v$
Figure 4: SGD with Different Step Sizes and Bias Terms
Figure 5: Indirect Data-driven Policy Gradient Descent
...and 1 more figures

Theorems & Definitions (13)

Lemma 1: Boundedness of $\lVert\nabla C(K)\rVert, \lVert K\rVert$
Lemma 2: Lipschitz continuity of $\Sigma_K,C,\nabla C$
Lemma 3: Gradient Domination
Lemma 4: Quasi-smoothness
Lemma 5: Estimation Error of Gradient
Definition 1: Local Persistency
Lemma 6: Mean-square Boundedness
Lemma 7: Gradient Oracle from Indirect Method
Lemma 8: Gradient Oracle from Direct Method
Lemma 9
...and 3 more

A Stochastic Gradient Descent Approach to Design Policy Gradient Methods for LQR

Abstract

A Stochastic Gradient Descent Approach to Design Policy Gradient Methods for LQR

Authors

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (13)