Table of Contents
Fetching ...

Recent Developments in Machine Learning Methods for Stochastic Control and Games

Ruimeng Hu, Mathieu Laurière

TL;DR

This survey surveys neural network–based methods for solving stochastic control and differential games, emphasizing high-dimensional and complex dynamics including delays and common noise. It connects deep BSDEs, PDE-based deep learning, and dynamic-programming–inspired direct parameterization, highlighting model-based and model-free approaches as well as mean-field formulations. Key contributions include (i) systematic distillation of Deep BSDE, DBDP, and Deep Galerkin methodologies, (ii) extended treatments for delay, mean-field control, N-player games, and mean-field games with common noise, and (iii) demonstrations in price-impact, epidemic control, and systemic-risk contexts. The practical impact lies in providing scalable, architecture-informed frameworks to approximate Nash equilibria and optimal controls in high-dimensional stochastic systems, with emphasis on learning-driven, data-efficient approaches and master-equation perspectives.

Abstract

Stochastic optimal control and games have a wide range of applications, from finance and economics to social sciences, robotics, and energy management. Many real-world applications involve complex models that have driven the development of sophisticated numerical methods. Recently, computational methods based on machine learning have been developed for solving stochastic control problems and games. In this review, we focus on deep learning methods that have unlocked the possibility of solving such problems, even in high dimensions or when the structure is very complex, beyond what traditional numerical methods can achieve. We consider mostly the continuous time and continuous space setting. Many of the new approaches build on recent neural-network-based methods for solving high-dimensional partial differential equations or backward stochastic differential equations, or on model-free reinforcement learning for Markov decision processes that have led to breakthrough results. This paper provides an introduction to these methods and summarizes the state-of-the-art works at the crossroad of machine learning and stochastic control and games.

Recent Developments in Machine Learning Methods for Stochastic Control and Games

TL;DR

This survey surveys neural network–based methods for solving stochastic control and differential games, emphasizing high-dimensional and complex dynamics including delays and common noise. It connects deep BSDEs, PDE-based deep learning, and dynamic-programming–inspired direct parameterization, highlighting model-based and model-free approaches as well as mean-field formulations. Key contributions include (i) systematic distillation of Deep BSDE, DBDP, and Deep Galerkin methodologies, (ii) extended treatments for delay, mean-field control, N-player games, and mean-field games with common noise, and (iii) demonstrations in price-impact, epidemic control, and systemic-risk contexts. The practical impact lies in providing scalable, architecture-informed frameworks to approximate Nash equilibria and optimal controls in high-dimensional stochastic systems, with emphasis on learning-driven, data-efficient approaches and master-equation perspectives.

Abstract

Stochastic optimal control and games have a wide range of applications, from finance and economics to social sciences, robotics, and energy management. Many real-world applications involve complex models that have driven the development of sophisticated numerical methods. Recently, computational methods based on machine learning have been developed for solving stochastic control problems and games. In this review, we focus on deep learning methods that have unlocked the possibility of solving such problems, even in high dimensions or when the structure is very complex, beyond what traditional numerical methods can achieve. We consider mostly the continuous time and continuous space setting. Many of the new approaches build on recent neural-network-based methods for solving high-dimensional partial differential equations or backward stochastic differential equations, or on model-free reinforcement learning for Markov decision processes that have led to breakthrough results. This paper provides an introduction to these methods and summarizes the state-of-the-art works at the crossroad of machine learning and stochastic control and games.
Paper Structure (55 sections, 201 equations, 21 figures, 5 algorithms)

This paper contains 55 sections, 201 equations, 21 figures, 5 algorithms.

Figures (21)

  • Figure 1: A case study of the COVID-19 pandemic in three states: New York (NY), New Jersey (NJ), and Pennsylvania (PA) in xuan2020optimal. Plots of optimal policies (top-left), Susceptibles (top-right), Exposed (bottom-left), and Infectious (bottom-right) for three states: New York (blue), New Jersey (orange), and Pennsylvania (green). Large $\ell$ indicates high intensity of lockdown policy. Choices of parameters are referred to xuan2020optimal.
  • Figure 2: The illustrative linear quadratic model in Section \ref{['sec:intro-LQsysrisk']}. Panels (a) and (b) give three trajectories of $X_t$, $m_t = \mathbb{E}[X_t \vert \mathcal{F}_t^{W^0}]$ (solid lines) and their approximations $\widehat{X}_t$ (dashed lines) using different realizations of $(X_0, W, W^0)$ from validation data. Panel (c) shows the minimized cost computed using validation data over fictitious play iterations. Parameter choices are given in MinHu:21.
  • Figure 3: The linear-quadratic regulator problem with delay in Section \ref{['sec:sc_delay']}. Left: Training curve of two models in the example of linear-quadratic problem. Right: The effect of lag time $\bar{\delta}$ processed by the feedforward model in the example of the linear-quadratic problem. The lag time $\delta$ in the actual system is $1$.
  • Figure 4: The linear-quadratic regulator problem with delay in Section \ref{['sec:sc_delay']}. A sample path of the first 5 dimensions of the state $X_t$ and control $\alpha_t$ obtained from the LSTM (top) model and FNN (top) model. Left: the optimal state process discretized from the analytical solution $(X_t)_i$ (solid lines) and its approximation $(\hat{X}_t)_i$ (dashed lines) provided by the approximating control, under the same realized path of Brownian motion. Right: comparisons of the optimal control $(\alpha_t)_i$ (solid lines) and $(\hat{\alpha}_t)_i$ (dashed lines).
  • Figure 5: Price impact MFC example in Section \ref{['sec:directMethod']} solved by direct method. Left: Control learnt (dots) and exact solution (lines). Right: associated empirical state distribution. Here we take $\gamma = 0.2$ in \ref{['eq:priceimpact']}.
  • ...and 16 more figures

Theorems & Definitions (32)

  • Definition 2.1: Stochastic control problem
  • Remark 2.2
  • Remark 2.3
  • Remark 2.4
  • Remark 2.5: Theoretical analysis
  • Remark 2.6
  • Remark 2.7: Theoretical analysis
  • Remark 2.8
  • Remark 2.9: Theoretical analysis
  • Remark 2.10
  • ...and 22 more