A Policy Gradient Framework for Stochastic Optimal Control Problems with Global Convergence Guarantee
Mo Zhou, Jianfeng Lu
TL;DR
This work develops a gradient-flow framework for policy gradient methods in stochastic optimal control with controlled diffusion in continuous time. It derives the continuous-time update for the control $u^{\tau}$ via the cost functional $J[u]$, expresses the gradient in terms of the density $\rho^u$ and the Hamiltonian $G$, and introduces a local optimal control function $u^{\diamond}$ to enable a Polyak–Łojasiewicz-based convergence analysis. The main results prove global convergence of the gradient flow to the optimal control $u^*$ under mild regularity and strong concavity of $G$ in $u$, with a linear convergence rate under an additional modulus condition. The analysis integrates barrier-function ideas with viscosity-solution intuition and sets the stage for extensions to actor–critic schemes and viscosity-solutions-era problems in nonlinear stochastic control.
Abstract
We consider policy gradient methods for stochastic optimal control problem in continuous time. In particular, we analyze the gradient flow for the control, viewed as a continuous time limit of the policy gradient method. We prove the global convergence of the gradient flow and establish a convergence rate under some regularity assumptions. The main novelty in the analysis is the notion of local optimal control function, which is introduced to characterize the local optimality of the iterate.
