Solving Time-Continuous Stochastic Optimal Control Problems: Algorithm Design and Convergence Analysis of Actor-Critic Flow

Mo Zhou; Jianfeng Lu

Solving Time-Continuous Stochastic Optimal Control Problems: Algorithm Design and Convergence Analysis of Actor-Critic Flow

Mo Zhou, Jianfeng Lu

TL;DR

This work addresses the time-continuous stochastic optimal control problem by formulating it in an actor-critic learning framework. It introduces a modified least-squares TD critic to estimate the value function and its gradient, together with a policy-gradient actor that updates a smooth feedback control. The authors prove a global linear convergence rate for the joint actor-critic flow under suitable regularity and concavity assumptions, and validate the approach on LQ-type problems and an economics-growth model, demonstrating accurate value-function and policy learning. The framework provides a convergent, continuous-time alternative to discretize-then-optimize methods for stochastic control with potential applications in engineering and economics.

Abstract

We propose an actor-critic framework to solve the time-continuous stochastic optimal control problem. A least square temporal difference method is applied to compute the value function for the critic. The policy gradient method is implemented as policy improvement for the actor. Our key contribution lies in establishing a linear rate of convergence for our proposed actor-critic flow. Theoretical findings are further validated through numerical examples, showing the efficacy of our approach in practical applications.

Solving Time-Continuous Stochastic Optimal Control Problems: Algorithm Design and Convergence Analysis of Actor-Critic Flow

TL;DR

Abstract

Paper Structure (14 sections, 13 theorems, 245 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 14 sections, 13 theorems, 245 equations, 2 figures, 1 table, 1 algorithm.

Introduction
The stochastic optimal control problem
The actor-critic framework
Policy evaluation for the critic
Policy gradient for the actor
The actor-critic flow
Theoretical analysis for the actor-critic flow
Numerical examples
The LQ problem
Aiyagari's growth model in economics
Conclusion and future directions
Proofs for the Propositions
Some auxiliary lemmas
Proofs for the theorems

Key Result

Proposition 1

Let $b,r \in C^2(\mathcal{X} \times \mathbb{R}^{n'})$ and $g,\sigma \in C^2(\mathcal{X})$. Let $u \in C^{1,2}([0,T];\mathcal{X})$ be a control function and $V_u$ be the corresponding value function. Let $\rho^u(t, \cdot)$ be the density for the state process $x_t$ starting with uniform distribution where $\dfrac{\delta }{\delta u}$ denotes the $L^2$ first order variation w.r.t. the function $u(\c

Figures (2)

Figure 1: Numerical results for the LQ problem. First line: plot for the value function, its spatial gradient and the control function at $t=0$ for $1d$ LQ problem. Each figure compare the true function with its neural network approximation. Second line: training curve for $1d$ LQ problem and the density plots of the value function and the control function for $10d$ LQ problem.
Figure 2: Numerical results for the Aiyagari's example. The two figures show the plot of the value function and control function. Each figure compares the true function with its neural network approximation.

Theorems & Definitions (31)

Remark 1
Proposition 1
Remark 2
Proposition 2
proof
Proposition 3
Remark 3
Theorem 1: Critical point for the joint dynamic
proof
Theorem 2: Critic improvement
...and 21 more

Solving Time-Continuous Stochastic Optimal Control Problems: Algorithm Design and Convergence Analysis of Actor-Critic Flow

TL;DR

Abstract

Solving Time-Continuous Stochastic Optimal Control Problems: Algorithm Design and Convergence Analysis of Actor-Critic Flow

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (31)