Table of Contents
Fetching ...

Convergence of the deep BSDE method for stochastic control problems formulated through the stochastic maximum principle

Zhipeng Huang, Balint Negyesi, Cornelis W. Oosterlee

TL;DR

This paper addresses high-dimensional stochastic control problems formulated via the stochastic maximum principle (SMP) and develops a deep SMP-BSDE method to solve the resulting vector-valued FBSDEs. The authors prove an a-posteriori convergence bound that relates the numerical error to the time discretization and to the terminal loss, generalizing scalar results to multi-dimensional settings. They also compare the SMP-based approach with HJB- and DP-based deep BSDE methods, highlighting its ability to handle diffusion control. Numerical experiments on linear-quadratic-type problems show accurate, scalable performance and practical convergence guarantees in high dimensions.

Abstract

It is well-known that decision-making problems from stochastic control can be formulated by means of a forward-backward stochastic differential equation (FBSDE). Recently, the authors of Ji et al. 2022 proposed an efficient deep learning algorithm based on the stochastic maximum principle (SMP). In this paper, we provide a convergence result for this deep SMP-BSDE algorithm and compare its performance with other existing methods. In particular, by adopting a strategy as in Han and Long 2020, we derive a-posteriori estimate, and show that the total approximation error can be bounded by the value of the loss functional and the discretization error. We present numerical examples for high-dimensional stochastic control problems, both in case of drift- and diffusion control, which showcase superior performance compared to existing algorithms.

Convergence of the deep BSDE method for stochastic control problems formulated through the stochastic maximum principle

TL;DR

This paper addresses high-dimensional stochastic control problems formulated via the stochastic maximum principle (SMP) and develops a deep SMP-BSDE method to solve the resulting vector-valued FBSDEs. The authors prove an a-posteriori convergence bound that relates the numerical error to the time discretization and to the terminal loss, generalizing scalar results to multi-dimensional settings. They also compare the SMP-based approach with HJB- and DP-based deep BSDE methods, highlighting its ability to handle diffusion control. Numerical experiments on linear-quadratic-type problems show accurate, scalable performance and practical convergence guarantees in high dimensions.

Abstract

It is well-known that decision-making problems from stochastic control can be formulated by means of a forward-backward stochastic differential equation (FBSDE). Recently, the authors of Ji et al. 2022 proposed an efficient deep learning algorithm based on the stochastic maximum principle (SMP). In this paper, we provide a convergence result for this deep SMP-BSDE algorithm and compare its performance with other existing methods. In particular, by adopting a strategy as in Han and Long 2020, we derive a-posteriori estimate, and show that the total approximation error can be bounded by the value of the loss functional and the discretization error. We present numerical examples for high-dimensional stochastic control problems, both in case of drift- and diffusion control, which showcase superior performance compared to existing algorithms.
Paper Structure (7 sections, 5 theorems, 70 equations, 5 figures, 1 algorithm)

This paper contains 7 sections, 5 theorems, 70 equations, 5 figures, 1 algorithm.

Key Result

Theorem 1

Let Assumption assume_adjoint hold, and $\left( X_t^*, u_t^*, P_t^*, Q_t^* \right)$ be an admissible 4-tuple. Suppose that $g(\cdot)$ is convex, $\Bar{H}\left(t, \cdot, \cdot, P_t^*, Q_t^* \right)$ defined by (def:Ham) is concave for all $t \in[0, T]$$\mathbb{P}$ almost surely, and the maximum condi holds. Then, $\left( X_t^*, u_t^* \right)$ is an optimal pair of problem stochastic_control_problem

Figures (5)

  • Figure 1: Example 1, $N=50$. Errors of approximations of the optimally controlled state space and control strategy. On top: results obtained through Algorithm \ref{['algorithm']} and the SMP. On the bottom: reference method from andersson2023 using dynamic programming. Lines correspond to the mean of $5$ independent runs of the algorithm, shaded areas to the standard deviation. Graphs computed over an independent Monte Carlo sample of size $M=2^{14}$.
  • Figure 2: Convergence and empirical convergence rates of Algorithm \ref{['algorithm']} over $N$. Lines correspond to the mean of 5 independent runs of the algorithm, error bars to the standard deviation. Graphs compute over an independent Monte Carlo sample of size $M=2^{14}$.
  • Figure 3: Relative $L^2$ approximation errors over time, $N=100$. Lines correspond to the mean of 5 independent runs of the algorithm, shaded areas to the standard deviation. Graphs computed over an independent Monte Carlo sample of size $M=2^{14}$.
  • Figure 4: Convergence of the a-posteriori error estimate defined in \ref{['eq:thm4:bound']}. Lines correspond to the mean of 5 independent runs of the algorithm, error bars to the standard deviation. Graphs compute over an independent Monte Carlo sample of size $M=2^{14}$.
  • Figure 5: Example 2, $N=50$. Errors of the approximations of the optimally controlled state space and control strategy through Algorithm \ref{['algorithm']} and the SMP. Lines correspond to the mean of $5$ independent runs of the algorithm, shaded areas to the standard deviation. Graphs computed over an independent Monte Carlo sample of size $M=2^{14}$.

Theorems & Definitions (17)

  • Remark 1
  • Remark 2
  • Theorem 1: Stochastic maximum principle
  • Remark 3
  • Remark 4
  • Remark 5
  • Remark 6
  • Theorem 2: Convergence of the implicit scheme
  • Remark 7
  • Remark 8
  • ...and 7 more