Consensus Based Stochastic Control
Liyao Lyu, Jingrun Chen
TL;DR
The paper tackles high-dimensional finite-horizon stochastic optimal control without gradient estimation by introducing gradient-free, consensus-based policy optimization (M-CBO and Adam-CBO). These methods treat policy optimization as a Monte Carlo evaluation problem, using a consensus mechanism and momentum-like dynamics with Gaussian exploration to drive the population toward the optimal policy; Adam-CBO adds an adaptive momentum akin to Adam. The authors prove well-posedness and a mean-field convergence result for M-CBO and demonstrate strong numerical performance across LQG, Ginzburg-Landau, and mean-field systemic-risk problems, with Adam-CBO particularly excelling in higher dimensions. The work offers a scalable, model-free alternative to gradient-based and discretization-based SOC methods and points to extensions to mean-field games and constrained problems.
Abstract
We propose a gradient-free deep reinforcement learning algorithm to solve high-dimensional, finite-horizon stochastic control problems. Although the recently developed deep reinforcement learning framework has achieved great success in solving these problems, direct estimation of policy gradients from Monte Carlo sampling often suffers from high variance. To address this, we introduce the Momentum Consensus-Based Optimization (M-CBO) and Adaptive Momentum Consensus-Based Optimization (Adam-CBO) frameworks. These methods optimize policies using Monte Carlo estimates of the value function, rather than its gradients. Adjustable Gaussian noise supports efficient exploration, helping the algorithm converge to optimal policies in complex, nonconvex environments. Numerical results confirm the accuracy and scalability of our approach across various problem dimensions and show the potential for extension to mean-field control problems. Theoretically, we prove that M-CBO can converge to the optimal policy under some assumptions.
