Table of Contents
Fetching ...

Policy Gradient Methods for Discrete Time Linear Quadratic Regulator With Random Parameters

Deyue Li

TL;DR

The sub-Gaussianity of the state process is investigated and the global linear convergence guarantee for this approach is established based on assumptions that are weaker and easier to verify compared to existing results.

Abstract

This paper studies an infinite horizon optimal control problem for discrete-time linear system and quadratic criteria, both with random parameters which are independent and identically distributed with respect to time. In this general setting, we apply the policy gradient method, a reinforcement learning technique, to search for the optimal control without requiring knowledge of statistical information of the parameters. We investigate the sub-Gaussianity of the state process and establish global linear convergence guarantee for this approach based on assumptions that are weaker and easier to verify compared to existing results. Numerical experiments are presented to illustrate our result.

Policy Gradient Methods for Discrete Time Linear Quadratic Regulator With Random Parameters

TL;DR

The sub-Gaussianity of the state process is investigated and the global linear convergence guarantee for this approach is established based on assumptions that are weaker and easier to verify compared to existing results.

Abstract

This paper studies an infinite horizon optimal control problem for discrete-time linear system and quadratic criteria, both with random parameters which are independent and identically distributed with respect to time. In this general setting, we apply the policy gradient method, a reinforcement learning technique, to search for the optimal control without requiring knowledge of statistical information of the parameters. We investigate the sub-Gaussianity of the state process and establish global linear convergence guarantee for this approach based on assumptions that are weaker and easier to verify compared to existing results. Numerical experiments are presented to illustrate our result.
Paper Structure (34 sections, 42 theorems, 350 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 34 sections, 42 theorems, 350 equations, 4 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Assume Assumptions ass:1 and ass:2 hold, and let $L_0\in U_{ad}$. Then for any $\epsilon>0$, the model-based policy gradient update with a constant step size $0<\eta\leq \eta^*$ converges globally to the optimal matrix $L^*$ at a linear rate: where $\eta^* = \mathcal{C}_{A,B,Q,R,\mu} \left(1+C(L_0)\right)^{-5}$. Moreover, if the number of iteration $k$ satisfies then the model-based policy grad

Figures (4)

  • Figure 1: Relative error vs. iteration during policy gradient method on the $n=3, m=2$ example.
  • Figure 2: Numerical performance of varying parameters
  • Figure 3: Numerical performance when using adaptive step size($N=500$)
  • Figure 4: Relative error vs. iteration during policy gradient method on the $n=20, m=10$ example.

Theorems & Definitions (89)

  • Theorem 1: Global Convergence of Model-based Policy Gradient
  • Definition 1
  • Remark 2.1
  • Theorem 2
  • Lemma 4.1: Gradient domination
  • Lemma 4.2: Almost smoothness
  • Remark 4.1: Explanation of the name
  • Lemma 4.3
  • Lemma 4.4: One step update of model-based gradient decent
  • Remark 4.2
  • ...and 79 more