Table of Contents
Fetching ...

Consensus Based Stochastic Control

Liyao Lyu, Jingrun Chen

TL;DR

The paper tackles high-dimensional finite-horizon stochastic optimal control without gradient estimation by introducing gradient-free, consensus-based policy optimization (M-CBO and Adam-CBO). These methods treat policy optimization as a Monte Carlo evaluation problem, using a consensus mechanism and momentum-like dynamics with Gaussian exploration to drive the population toward the optimal policy; Adam-CBO adds an adaptive momentum akin to Adam. The authors prove well-posedness and a mean-field convergence result for M-CBO and demonstrate strong numerical performance across LQG, Ginzburg-Landau, and mean-field systemic-risk problems, with Adam-CBO particularly excelling in higher dimensions. The work offers a scalable, model-free alternative to gradient-based and discretization-based SOC methods and points to extensions to mean-field games and constrained problems.

Abstract

We propose a gradient-free deep reinforcement learning algorithm to solve high-dimensional, finite-horizon stochastic control problems. Although the recently developed deep reinforcement learning framework has achieved great success in solving these problems, direct estimation of policy gradients from Monte Carlo sampling often suffers from high variance. To address this, we introduce the Momentum Consensus-Based Optimization (M-CBO) and Adaptive Momentum Consensus-Based Optimization (Adam-CBO) frameworks. These methods optimize policies using Monte Carlo estimates of the value function, rather than its gradients. Adjustable Gaussian noise supports efficient exploration, helping the algorithm converge to optimal policies in complex, nonconvex environments. Numerical results confirm the accuracy and scalability of our approach across various problem dimensions and show the potential for extension to mean-field control problems. Theoretically, we prove that M-CBO can converge to the optimal policy under some assumptions.

Consensus Based Stochastic Control

TL;DR

The paper tackles high-dimensional finite-horizon stochastic optimal control without gradient estimation by introducing gradient-free, consensus-based policy optimization (M-CBO and Adam-CBO). These methods treat policy optimization as a Monte Carlo evaluation problem, using a consensus mechanism and momentum-like dynamics with Gaussian exploration to drive the population toward the optimal policy; Adam-CBO adds an adaptive momentum akin to Adam. The authors prove well-posedness and a mean-field convergence result for M-CBO and demonstrate strong numerical performance across LQG, Ginzburg-Landau, and mean-field systemic-risk problems, with Adam-CBO particularly excelling in higher dimensions. The work offers a scalable, model-free alternative to gradient-based and discretization-based SOC methods and points to extensions to mean-field games and constrained problems.

Abstract

We propose a gradient-free deep reinforcement learning algorithm to solve high-dimensional, finite-horizon stochastic control problems. Although the recently developed deep reinforcement learning framework has achieved great success in solving these problems, direct estimation of policy gradients from Monte Carlo sampling often suffers from high variance. To address this, we introduce the Momentum Consensus-Based Optimization (M-CBO) and Adaptive Momentum Consensus-Based Optimization (Adam-CBO) frameworks. These methods optimize policies using Monte Carlo estimates of the value function, rather than its gradients. Adjustable Gaussian noise supports efficient exploration, helping the algorithm converge to optimal policies in complex, nonconvex environments. Numerical results confirm the accuracy and scalability of our approach across various problem dimensions and show the potential for extension to mean-field control problems. Theoretically, we prove that M-CBO can converge to the optimal policy under some assumptions.

Paper Structure

This paper contains 17 sections, 16 theorems, 149 equations, 7 figures, 3 tables, 2 algorithms.

Key Result

Theorem 1

Under the Assumption assum:Lip_J, for each $N\in \mathbb N$, the stochastic differential equation equ:CBO_SDE has a unique strong solution $\left\{\left (\boldsymbol{\Theta}^{(N)}_{t}, \boldsymbol{\Omega} ^{(N)}_{t})\right )|t>0\right\}$ for any initial condition $\left (\boldsymbol{\Theta}^{(N)}_0,

Figures (7)

  • Figure 1: The value function $u(t=0, \mathbf{x}=(0, \ldots, 0))$ evaluated using BSDE method, M-CBO method, Adam-CBO method (our method), and MC estimation (reference) for problems in $1, 2, 4, 8,$ and $16$ dimensions. (a) The terminal cost function $g(\mathbf{x}) = \ln \frac{1 + \| \mathbf{x} \|^2}{2}$. (b) The terminal cost function $g(\mathbf{x}) = \ln \frac{1 + (\| \mathbf{x} \|^2 - 1)^2}{2}$.
  • Figure 2: The value function $u(t,x)$ in the one-dimensional case, computed using BSDE method, MC Estimation (reference), and Adam-CBO (our method), with terminal cost $g(\mathbf{x}) = \ln \frac{1 + \| \mathbf{x} \|^2}{2}$.
  • Figure 3: The value function $u(t,x)$ in one-dimensional case, computed using BSDE method, MC Estimation (reference), and Adam-CBO (our method), with terminal cost $g(\mathbf{x}) = \ln \frac{1 + (\| \mathbf{x} \|^2 - 1)^2}{2}$.
  • Figure 4: The value function $u(t=0,0,0,0,0)$ of 4D LQC problem, computed using BSDE method, MC Estimation (reference), and Adam-CBO (our method), with terminal cost $g(\mathbf{x}) = \ln \frac{1 + (\|\mathbf{x}\|^2 - 5)^2}{2}$, evaluated under varying sample sizes per step.".
  • Figure 5: Distribution of $x_1$ before and after control in the 1D Ginzburg-Landau model.
  • ...and 2 more figures

Theorems & Definitions (36)

  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • proof
  • Theorem 4
  • proof
  • Lemma 1
  • proof
  • ...and 26 more