A Policy Gradient Framework for Stochastic Optimal Control Problems with Global Convergence Guarantee

Mo Zhou; Jianfeng Lu

A Policy Gradient Framework for Stochastic Optimal Control Problems with Global Convergence Guarantee

Mo Zhou, Jianfeng Lu

TL;DR

This work develops a gradient-flow framework for policy gradient methods in stochastic optimal control with controlled diffusion in continuous time. It derives the continuous-time update for the control $u^{\tau}$ via the cost functional $J[u]$, expresses the gradient in terms of the density $\rho^u$ and the Hamiltonian $G$, and introduces a local optimal control function $u^{\diamond}$ to enable a Polyak–Łojasiewicz-based convergence analysis. The main results prove global convergence of the gradient flow to the optimal control $u^*$ under mild regularity and strong concavity of $G$ in $u$, with a linear convergence rate under an additional modulus condition. The analysis integrates barrier-function ideas with viscosity-solution intuition and sets the stage for extensions to actor–critic schemes and viscosity-solutions-era problems in nonlinear stochastic control.

Abstract

We consider policy gradient methods for stochastic optimal control problem in continuous time. In particular, we analyze the gradient flow for the control, viewed as a continuous time limit of the policy gradient method. We prove the global convergence of the gradient flow and establish a convergence rate under some regularity assumptions. The main novelty in the analysis is the notion of local optimal control function, which is introduced to characterize the local optimality of the iterate.

A Policy Gradient Framework for Stochastic Optimal Control Problems with Global Convergence Guarantee

TL;DR

via the cost functional

, expresses the gradient in terms of the density

and the Hamiltonian

, and introduces a local optimal control function

to enable a Polyak–Łojasiewicz-based convergence analysis. The main results prove global convergence of the gradient flow to the optimal control

under mild regularity and strong concavity of

, with a linear convergence rate under an additional modulus condition. The analysis integrates barrier-function ideas with viscosity-solution intuition and sets the stage for extensions to actor–critic schemes and viscosity-solutions-era problems in nonlinear stochastic control.

Abstract

Paper Structure (13 sections, 12 theorems, 295 equations)

This paper contains 13 sections, 12 theorems, 295 equations.

Introduction
Related works
Organization of the paper
Theoretical background: the stochastic optimal control problem
A policy gradient method for the control problem
Convergence of the policy gradient
Conclusion and future directions
Examples
A concrete example
A counter example for multiple critical points of the gradient flow
Proofs for the Propositions
Some auxiliary lemmas
Proof for the theorems

Key Result

Proposition 1

Let $u$ be a control function and $V_u$ be the corresponding value function. Let the state process $x_t$ start with uniform distribution on $\mathcal{X}$ and follow the SDE eq:SDE_X with control $u$. Then we have where $\dfrac{\delta J}{\delta u}$ denotes the functional derivative of $J$ w.r.t. $u$, and $\nabla_u G$ denotes the gradient of $G$ w.r.t. its third argument (as a vector in $\mathbb{R}

Theorems & Definitions (26)

Proposition 1
Remark
Proposition 2: Monotonicity of value function in $\tau$
Proposition 3
Theorem 1: Critical point for policy gradient
proof
Theorem 2: Convergence of the policy gradient
Theorem 3: Convergence rate of the policy gradient
proof : Key ideas of proofs to Theorems \ref{['thm:actor_converge']} and \ref{['thm:actor_rate']}
proof : Proof of Proposition \ref{['prop:cost_derivative']}
...and 16 more

A Policy Gradient Framework for Stochastic Optimal Control Problems with Global Convergence Guarantee

TL;DR

Abstract

A Policy Gradient Framework for Stochastic Optimal Control Problems with Global Convergence Guarantee

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (26)