Table of Contents
Fetching ...

Martingale deep learning for very high dimensional quasi-linear partial differential equations and stochastic optimal controls

Wei Cai, Shuixin Fang, Wenzhong Zhang, Tao Zhou

TL;DR

The paper tackles the challenge of solving very high-dimensional parabolic PDEs and Hamilton-Jacobi-Bellman equations arising in stochastic optimal control. It introduces a derivative-free martingale deep learning framework that reframes the PDEs as martingale conditions using a pilot process, enabling offline path generation and parallel training across time and space. A weak Galerkin formulation with adversarial test functions eliminates the need for conditional expectations, while a Policy Improvement Algorithm extension yields simultaneous learning of the value function and optimal control without explicit minimization. Numerical results demonstrate accurate solutions up to $d=10^4$ across quasilinear PDEs and HJB equations, with favorable runtimes and robustness to nonlinearities, highlighting practical potential for high-dimensional SOCPs.

Abstract

In this paper, a highly parallel and derivative-free martingale neural network learning method is proposed to solve Hamilton-Jacobi-Bellman (HJB) equations arising from stochastic optimal control problems (SOCPs), as well as general quasilinear parabolic partial differential equations (PDEs). In both cases, the PDEs are reformulated into a martingale formulation such that loss functions will not require the computation of the gradient or Hessian matrix of the PDE solution, while its implementation can be parallelized in both time and spatial domains. Moreover, the martingale conditions for the PDEs are enforced using a Galerkin method in conjunction with adversarial learning techniques, eliminating the need for direct computation of the conditional expectations associated with the martingale property. For SOCPs, a derivative-free implementation of the maximum principle for optimal controls is also introduced. The numerical results demonstrate the effectiveness and efficiency of the proposed method, which is capable of solving HJB and quasilinear parabolic PDEs accurately in dimensions as high as 10,000.

Martingale deep learning for very high dimensional quasi-linear partial differential equations and stochastic optimal controls

TL;DR

The paper tackles the challenge of solving very high-dimensional parabolic PDEs and Hamilton-Jacobi-Bellman equations arising in stochastic optimal control. It introduces a derivative-free martingale deep learning framework that reframes the PDEs as martingale conditions using a pilot process, enabling offline path generation and parallel training across time and space. A weak Galerkin formulation with adversarial test functions eliminates the need for conditional expectations, while a Policy Improvement Algorithm extension yields simultaneous learning of the value function and optimal control without explicit minimization. Numerical results demonstrate accurate solutions up to across quasilinear PDEs and HJB equations, with favorable runtimes and robustness to nonlinearities, highlighting practical potential for high-dimensional SOCPs.

Abstract

In this paper, a highly parallel and derivative-free martingale neural network learning method is proposed to solve Hamilton-Jacobi-Bellman (HJB) equations arising from stochastic optimal control problems (SOCPs), as well as general quasilinear parabolic partial differential equations (PDEs). In both cases, the PDEs are reformulated into a martingale formulation such that loss functions will not require the computation of the gradient or Hessian matrix of the PDE solution, while its implementation can be parallelized in both time and spatial domains. Moreover, the martingale conditions for the PDEs are enforced using a Galerkin method in conjunction with adversarial learning techniques, eliminating the need for direct computation of the conditional expectations associated with the martingale property. For SOCPs, a derivative-free implementation of the maximum principle for optimal controls is also introduced. The numerical results demonstrate the effectiveness and efficiency of the proposed method, which is capable of solving HJB and quasilinear parabolic PDEs accurately in dimensions as high as 10,000.
Paper Structure (16 sections, 46 equations, 7 figures, 2 tables, 2 algorithms)

This paper contains 16 sections, 46 equations, 7 figures, 2 tables, 2 algorithms.

Figures (7)

  • Figure 1: Numerical results of \ref{['alg_pde']} applied to the Allen-Cahn equation Eq. (\ref{['eq_acpde']}) with $d = 100$. (Top) The reference and predicted values of $v(0, x_0)$ versus the iteration steps with $x_0 = \left(0, \cdots, 0\right)^{\top} \in \mathbb{R}^{100}$. The shaded region represents the mean $\pm 2 \times$ SD of $v_{\theta}$ across 10 independent runs. The widths of $u_{\alpha}$ and $v_{\theta}$ are both $W = 2d + 10$. (Bottom) RE of $v(0, x_0)$ vs iteration steps, where shaded region represents the mean $+ 2 \times$ SD of the RE across 10 independent runs. The mean RE and the SD achieve $3.2 \times 10^{-3}$ and $2.1 \times 10^{-3}$, respectively, at the 500-th iteration step within a runtime less than 6.8 seconds.
  • Figure 2: Numerical results of \ref{['alg_pde']} for Eq. (\ref{['eq_diffpde']}) with $d=10^4$. (Top) RE vs Iteration of \ref{['alg_pde']} in solving $v(0, s\boldsymbol{1}_d)$ for $s \in S$, under different combinations of $S$ and $W$, where $W$ is the width of $v_{\theta}$. The shaded region represents the mean $+ 2 \times$ SD of the RE across 5 independent runs. The mean and the SD of RE, and the RT at the 9000-th iteration step are given in \ref{['tab_RESDRT']}. (Bottom) The true and predicted values of $s \mapsto v(0, s\boldsymbol{1}_d)$ at the 9000-th iteration step under Settings 2 and 3.
  • Figure 3: Numerical results of \ref{['alg_pde']} for $s \mapsto v(0, s\boldsymbol{1}_d)$ from various quasilinear PDEs with $d = 10^4$. The width of $v_{\theta}$ is set to $W = d + 10$. The shaded region represents the mean $+ 2 \times$ SD of the relative errors across 5 independent runs. The running times for each run are all less than 5500 seconds.
  • Figure 4: Graphs of the true solutions of HJB-3a and HJB-3b. The orange (solid) and the blue (dashed) curves depict the mappings $s \mapsto v(0, s\boldsymbol{1}_d)$ and $s \mapsto v(T, s\boldsymbol{1}_d)$, respectively.
  • Figure 5: Numerical results of \ref{['alg_amnet']} for HJB-3a with $d=2000$. The subfigures (a) and (b) show the curves of $s \mapsto v(0, s\boldsymbol{1}_d)$ for $W = d + 10$ and $5d + 10$, respectively, where $W$ denotes the widths of $u_{\alpha}$ and $v_{\theta}$. The shaded region in (c) represents the mean $+ 2 \times$ the SD of the RE across 5 independent runs. At the 6000-th iteration step, for $W = d+10$, the mean and the SD of RE, and the RT are $2.1 \times 10^{-2}$, $1.7 \times 10^{-3}$ and 540s, respectively; for $W = 5d+10$, the corresponding values are $5.0 \times 10^{-3}$, $6.7 \times 10^{-4}$ and 9050s.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4