Table of Contents
Fetching ...

A Policy Gradient Framework for Stochastic Optimal Control Problems with Global Convergence Guarantee

Mo Zhou, Jianfeng Lu

TL;DR

This work develops a gradient-flow framework for policy gradient methods in stochastic optimal control with controlled diffusion in continuous time. It derives the continuous-time update for the control $u^{\tau}$ via the cost functional $J[u]$, expresses the gradient in terms of the density $\rho^u$ and the Hamiltonian $G$, and introduces a local optimal control function $u^{\diamond}$ to enable a Polyak–Łojasiewicz-based convergence analysis. The main results prove global convergence of the gradient flow to the optimal control $u^*$ under mild regularity and strong concavity of $G$ in $u$, with a linear convergence rate under an additional modulus condition. The analysis integrates barrier-function ideas with viscosity-solution intuition and sets the stage for extensions to actor–critic schemes and viscosity-solutions-era problems in nonlinear stochastic control.

Abstract

We consider policy gradient methods for stochastic optimal control problem in continuous time. In particular, we analyze the gradient flow for the control, viewed as a continuous time limit of the policy gradient method. We prove the global convergence of the gradient flow and establish a convergence rate under some regularity assumptions. The main novelty in the analysis is the notion of local optimal control function, which is introduced to characterize the local optimality of the iterate.

A Policy Gradient Framework for Stochastic Optimal Control Problems with Global Convergence Guarantee

TL;DR

This work develops a gradient-flow framework for policy gradient methods in stochastic optimal control with controlled diffusion in continuous time. It derives the continuous-time update for the control via the cost functional , expresses the gradient in terms of the density and the Hamiltonian , and introduces a local optimal control function to enable a Polyak–Łojasiewicz-based convergence analysis. The main results prove global convergence of the gradient flow to the optimal control under mild regularity and strong concavity of in , with a linear convergence rate under an additional modulus condition. The analysis integrates barrier-function ideas with viscosity-solution intuition and sets the stage for extensions to actor–critic schemes and viscosity-solutions-era problems in nonlinear stochastic control.

Abstract

We consider policy gradient methods for stochastic optimal control problem in continuous time. In particular, we analyze the gradient flow for the control, viewed as a continuous time limit of the policy gradient method. We prove the global convergence of the gradient flow and establish a convergence rate under some regularity assumptions. The main novelty in the analysis is the notion of local optimal control function, which is introduced to characterize the local optimality of the iterate.
Paper Structure (13 sections, 12 theorems, 295 equations)

This paper contains 13 sections, 12 theorems, 295 equations.

Key Result

Proposition 1

Let $u$ be a control function and $V_u$ be the corresponding value function. Let the state process $x_t$ start with uniform distribution on $\mathcal{X}$ and follow the SDE eq:SDE_X with control $u$. Then we have where $\dfrac{\delta J}{\delta u}$ denotes the functional derivative of $J$ w.r.t. $u$, and $\nabla_u G$ denotes the gradient of $G$ w.r.t. its third argument (as a vector in $\mathbb{R}

Theorems & Definitions (26)

  • Proposition 1
  • Remark
  • Proposition 2: Monotonicity of value function in $\tau$
  • Proposition 3
  • Theorem 1: Critical point for policy gradient
  • proof
  • Theorem 2: Convergence of the policy gradient
  • Theorem 3: Convergence rate of the policy gradient
  • proof : Key ideas of proofs to Theorems \ref{['thm:actor_converge']} and \ref{['thm:actor_rate']}
  • proof : Proof of Proposition \ref{['prop:cost_derivative']}
  • ...and 16 more