Solving Time-Continuous Stochastic Optimal Control Problems: Algorithm Design and Convergence Analysis of Actor-Critic Flow
Mo Zhou, Jianfeng Lu
TL;DR
This work addresses the time-continuous stochastic optimal control problem by formulating it in an actor-critic learning framework. It introduces a modified least-squares TD critic to estimate the value function and its gradient, together with a policy-gradient actor that updates a smooth feedback control. The authors prove a global linear convergence rate for the joint actor-critic flow under suitable regularity and concavity assumptions, and validate the approach on LQ-type problems and an economics-growth model, demonstrating accurate value-function and policy learning. The framework provides a convergent, continuous-time alternative to discretize-then-optimize methods for stochastic control with potential applications in engineering and economics.
Abstract
We propose an actor-critic framework to solve the time-continuous stochastic optimal control problem. A least square temporal difference method is applied to compute the value function for the critic. The policy gradient method is implemented as policy improvement for the actor. Our key contribution lies in establishing a linear rate of convergence for our proposed actor-critic flow. Theoretical findings are further validated through numerical examples, showing the efficacy of our approach in practical applications.
